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TITLE OF INVENTION 
HIGH MOLECUIAR WEIGHT SURFACE PROTEi; yg 
OF NON-TYPEABLE HAEMOPHILDS 

FIELD OF INVENTION 
This invention relates to high molecular weight 
proteins of non-typeable haemophilus. 



BACKGRO UND TO THE INVENTTOfJ 

Non-typeable Haemophilus influenzae are non- 
encapsulated organisms that aure defined by their lack of 
10 reactivity with antisera against known H. inf iu«>nya«> 
capsular antigens. 

These organisms commonly inhabit the upper 
respiratory tract of humans and eure frequently 
responsible for a variety of common mucosal surface 
15 infections, such as otitis media, sinusitis, 

conjunctivitis, chronic bronchitis and pneumonia. Otitis 
media remains an important health problem for children 
and most children have had at least one episode of otitis 
by their third birthday and approximately one-third of 
children have had three or more episodes. Non-typesOjle 
Haemophilus influenzae generally accounts for about 20 to 
25% of acute otitis media and for a larger percentage of 
cases of chronic otitis media with effusion. 

A critical first step in the pathogenesis of these 
25 infections is colonization of the respiratory tract 

mucosa. Bacterial stirface molecules which mediate 
adherence, therefore, are of particular interest as 
possible vaccine candidates. 

Since the non-typeable organisms do not have a 
polysaccharide capsule, they are not controlled by the 
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present Haemophilus influenzae type b (Hib) vaccines, 
which are directed towards Hib bacterial capsular 
polysaccharides* The non-typeable strains, however, do 
produce surface antigens that can elicit bactericidal 
5 antibodies. Two of the major outer membrane proteins, P2 

and P6, have been identified as targets of human serum 
bactericidal activity. However, it has been shown that 
the P2 protein sequence is variable, in particular in the 
non-typeable Haemophilus strains. Thus, a P2 -based 
10 vaccine would not protect against all strains of the 
organism. 

There have previously been identified by Barenkaunp 
et al f Pediatr. Infect. Pis. J. , 9:333-339, 1990) a group 
of high-molecular-weight (HMW) proteins of non-typeable 
15 Haemophilus influenzae that appeared to be major targets 

of antibodies present in human convalescent sera. 
Examination of a series of middle ear isolates revealed 
the presence of one or two such proteins in most strains. 
However, prior to the present invention, the structures 
of these proteins and their encoding nucleic acid 
sequences were unknown as were pure isolates of such 
proteins. In addition, the identification of surface 
accessible epitopes of such proteins was unknown. 

SUMMARY OF INVENTION 
The inventor, in an effort to further characterize 
the high molecular weight (HMW) non-typeable Haemophilus 
proteins, has cloned, expressed and sequenced the genes 
coding for two immunodominant HMW proteins (designated 
HMWl and HMW2) from a prototype non-typeable Haemophilus 
strain and has cloned, expressed and sequenced the genes 
coding for two additional immunodominant HMW proteins 
(designated HMW3 and HMW4) from another non-typeabl 
Haemophilus strain. 

In accordance with one aspect of the present 
invention, therefore, there is provided an isolated and 
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purified nucleic acid molecule coding for a high 
laolpcular weight protein of a non-typeable Haeino»h^i»^ 
strain, particularly a nucleic acid molecule coding for 
protein HMWi, HMW2, HMW3 or HMW4, as well as any variant 
5 or fragment of such protein which retains the 
immunological ability to protect against disease caused 
by a non-typeable Haemoohi iti« strain. 

The nucleic acid molecule may have a DNA sequence 
Shown in Figure i (SEQ ID No: i) and encoding HMWi for 
straxn 12 having the derived amino acid sequence of 
Figure 2 (SEQ ID No: 2). The nucleic acid molecule may 
have the DNA sequence shown in Figure 3 (SEQ ID No: 3) 
and encoding protein HHW2 for strain 12 having the 
derived amino acid sequence of Figure -4 (SEQ ID No- 4) 
The nucleic acid molecule may have the DNA sequence shown 
in Figure 8 (SEQ ID No: 7) and encoding HMW3 for strain 
5 having the derived amino acid sequence of Figure 10 
(SEQ ID No: 9) . The nucleic acid molecule may have a DNA 
sequence shown in Figure 9 (SEQ ID No: 8) and encoding 
protein -HMW4 for strain 5 having the derived amino acid 
sequence of Figure 10 (SEQ ID No: 10) . 

In another aspect of the invention/ there is 
provided an isolated and purified nucleic acid molecule 
encoding a high molecular weight protein of a non- 
typeable Paemophtln^ strain, which is selected from the 
group consisting of: 

(a) a DNA sequence as shown in any one of Figures 
1/3, 8 and 9 (SEQ ID Nos: 1, 3, 7 and 8) ; 

(b) a DNA sequence encoding an amino acid Sequence 
as Shown in any one of Figures 2, 4 and lo (SEQ ID 
Nos: 2, 4, 9 and 10); and 

(c) a DNA sequence which hybridizes under stringent 
conditions to any one of the sequences of (a) and 
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A DNA sequence according to (c) may be one having at: 
least about 90% identity of sequence to the DNA sequences 
(a) or (b). 

The inventor has further found correct processing of 
5 the HMW protein requires the presence of additional 

downstream nucleic acid sequences. Accordingly, a 
further aspect of the present invention provides an 
isolated and purified gene cluster comprising a first 
nucleotide sequence encoding a high moleculax weight 

10 protein of a non-typeable Haemophilus strain and at least 
one downstream nucleotide sequence for effecting 
expression of a gene product of the first nucleotide 
sequence fully encoded by the structural gene. 

The gene cluster may comprise a DNA sequence 

15 encoding high molecular weight protein HMWl or HMW2 and 
two downstreeun accessory genes* The gene cluster may 
have the DNA sequence shown in Figxire 6 (SEQ ID No: 5) or 
Figure 7 (SEQ ID No, 6) • 

In an additional aspect, the present invention 

20 includes a vector adapted for transformation of a host, 
comprising a nucleic acid molecule as provided herein, 
particularly the gene cluster provided herein. The 
vector may be an expression vector or a plasmid adapted 
for expression of the encoded high molecular weight 

25 protein, fragments or analogs thereof, in a heterologous 
or homologous host and comprising expression means 
operatively coupled to the nucleic acid molecule* The 
expression means may include a nucleic acid .portion 
encoding a leader sequence for secretion from the host of 

30 the high molecular weight protein. The expression means 
may include a nucleic acid portion encoding a lipidatlon 
signal for expression from the host of a lipidated form 
of the high molecular weight protein. The host may be 
selected from, for example, E. coli, £a£lllSS&f 

35 Haemophilus , fungi, yeast, baculovirus and Semliki Forest 
Virus expression systems. The invention further includes 
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recombinant high molecular weight protein of non- 
typeable Haemonhilus or fragment or analog there f 
producible by the transformed host. 

In another aspect, the invention provides an 
5 isolated and purified high molecular weight protein of 
non-typeable H^emophiilMP influenraa which is encoded by 
a nucleic acid molecule as provided herein. Such high 
molecular weight proteins nay be produced recombinantly 
to be devoid of non-high molecular weight proteins of 
10 non-typeable paemophj.3.us influenaa^ or from natural 
sources . 

Such protein may be characterized by at least one 
surface-exposed B-cell epitope which is recognized by 

monoclonal antibody AD6 (ATCC ) . such protein may 

15 be HMWl encoded by the DNA sequence shown in Figure 1 

(SEQ ID No: 1) and having the derived amino acid sequence 
of Figure 2 (SEQ ID No: 2) and having an apparent 
molecular weight of 125 IcDa. Such protein may be HMW2 
encoded by the DNA sequence shown in Fig\ire 3 (SEQ ID No: 
20 3) and having the derived amino acid sequence of Figure 
4 (SEQ ID No: 4) and having an apparent molecular weight 
of 120 JcDA. Such protein may be HMW3 encoded by the DNA 
sequence shown in Figxire 8 (SEQ ID No: 7) and having the 
derived amino acid sequence of Figure 10 (SEQ ID No: 9) 
25 and having an apparent molecular weight of 12 5 XDa. Such 
protein may be HMW4 encoded by the DNA sequence shown in 
Figure 9 (SEQ ID No: 8) and having the derived amino acid 
sequence shown in Figure 10 (SEQ ID No: 10) and having 
the apparent molecular weight of i23)cDa. 
30 A further aspect of the invention provides an 

isolated and purified high molecular weight protein of 
non-typeable Haemophilus infin^r^y.^^ which is 
antigenically related to the filamentous hemagglutinin 
surface protein of pofdete3l.;La pertussis , particularly 
35 HMWl, HMW2, HMW3 or HMW4. 
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The novel high molecular weight: proteins of non- 
Hypeable Haemophilus may be used as carrier molecules by 
linking to an antigen, hapten or polysacchauride for 
eliciting an immune response to the antigen, hapten or 
5 polysaccharide. An example of such polysaccharide is a 

protective polysaccharide against Haemophilus influenzae 
type b. 

In a further aspect of the invention, there is 
provided a synthetic peptide having an amino acid 

10 sequence containing at least six amino acids and no more 
than 150 amino acids and corresponding to at least one 
protective epitope of a high molecular weight protein of 
non-typeable Haemophilus influenzae , specifically HMWl, 
HMW2 , HMW3 or HMW4 . The epitope may be one recognized by 

15 at least one of the monoclonal antibodies AD6 (ATCC .) 

and IOCS (ATCC ) . Specifically, the epitope may be 

located within 75 amino acids of the carboxy terminus of 
the HMWl or HMW2 protein and recognized by the monoclonal 
antibody A06. 

20 The present invention ^Iso^rovides an 

composition comprising an immunoef f ective eunount of an 
active component, which may be the novel high molecular 
weight protein or synthetic peptide provided herein, 
which may be formulated along with a pharmaceutically 

25 acceptsLble carrier therefor. The immunogenic composition 
may be formulated as a vaccine for in vivo administration 
to a host. 

The immunogenic composition may be formulated as a 
microparticle, capsule, XSCOM or liposome preparation. 

30 The immxinogenic composition may be used in combination 
with a targeting molecule for delivery to specific cells 
of the immune system or to mucosal surfaces. Some 
targeting molecules include vitamin B12 and fragments of 
bacterial toxins, as described in WO 92/17167 (Biotech 

35 Australia Pty. Ltd.), emd monoclonal antibodies, as 
described in U.S. Patent No. 5,194,254 (Barber et al) . 
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The immunogenic compositions of the .invention (including 
vaccines) may further comprise at least one other 
immunogenic or immunostimulating material and the 
immunostimulating material may be at least one adjuvant. 
5 Suitable adjuvants for use in the present invention 

include, (but are not limited to) aluminum phosphate, 
aluminum hydroxide, QS21, Quil A, derivatives and 
components thereof, ISCOM matrix, calcium phosphate, 
calcium hydroxide, zinc hydroxide, a glycolipid analog, 
an octadecyl ester of an amino acid, a muramyl dipeptide 
polyphosphazare, ISCOPRP, DC-chol, DDBA and a lipoprotein 
and other adjuvants to induce a Thl response. 
Advantageous combinations of adjuvants are described in 
copending United States patent Application Serial No. 
08/261,194 filed June 16, 1994, assigned to Connaught 
Laboratories Limited and the disclosure of which is 
incorporated herein by reference. 

In a further aspect of the invention, there is 
provided a method of generating an immune response in a 
z.host, comprising -administering thereto an immuno- 
effective amount of the immunogenic composition as 
provided herein. The immune response may be a hvunoral or 
a cell-mediated immune response. Hosts in which 
protection against disease may be conferred include 
25 primates including humans. 

The present invention additionally provides a method 
of producing antibodies specific for a high molecular 
weight protein of non-typeable Haemonhlius inf , 
comprising: 

(a) administering the high moleculsu- weight protein 
or epitope containing peptide provided herein to at least 
one mouse to produce at least one immunized mouse; 

(b) removing B-lymphocytes from the at least on 
immtinized mouse ; 
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(c) fusing the B- lymphocytes from the at least one 
immunized mouse with myeloma cells, thereby producing 
hybridomas; 

(d) cloning the hybridomas; 

5 (e) selecting clones which produce anti-high 

molecular weight protein antibody; 

(f) culturing the anti-high molecular weight 
protein antibody -producing clones; and then 

(g) isolating anti-high molecular weight protein 
10 antibodies from the cultures. 

Additional aspects of the present invention include 
monoclonal antibody AD6 and monoclonal antibody 10C5. 

The present invention provides, in an additional 
aspect thereof, a method for producing an immunogenic 

15 composition, comprising administering the immunogenic 
composition provided herein to a first test host to 
determine an amount and a frequency of administration 
thereof to elicit a selected immune response against a 
high molecular weight protein of non-typeable Haemophilus 

20 influenzae ; and formulating the immunogenic composition 

in a form suitable for administration to a second host in 
accordance with the determined eunount and f recjuency of 
administration « The second host may be a humaui* 

The novel envelope protein provided herein is useful 

25 in diagnostic procedures and kits for detecting 
antibodies to high molecular weight proteins of non- 
typeable Haemophilus influenzae ^ Fturther monoclonal 
antibodies specific for the high molecular protein or 
epitopes thereof are useful in diagnostic procedure amd 

30 kits for detecting the presence of the high moleculao: 
weight protein • 

Accordingly, a ftarther aspect of the invention 
provides a method of determining the presence in a 
saunple, of antibodies specifically reactive with a high 

35 mol cular weight protein of Haemophilus influenzae 
comprising the steps of: 
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(a) contacting the sample with the high molecular 
weight protein or epitope-containing peptide as 
provided herein to produce complexes comprising the 
protein and any said antibodies present in the 

5 sample specifically reactive therewith; and 

(b) determining production of the complexes. 

In a further aspect of the invention, there is 
provided a method of determining the presence, in a 
sample, of a high molecular weight protein of Haemophilus 
10 influenzae or an epitope-containing peptide, comprising 

the steps of: 

(a) immunizing a host with the protein or peptide 
as provided herein, to produce antibodies specific 
for the protein or peptide; 

15 (b) contacting the sample with the antibodies to 

produce complexes comprising any high molecular 
weight protein or epitope-containing peptide present 
in the sample and said specific antibodies; and 

(c) determining production -of the complexes. 

20 ^—--A-^further aspect of the invention provides a 

diagnostic kit for determining the presence of antibodies 
in a sample specifically reactive with a high moleculetr 
weight protein of non-typeable Haemophilus influenzae or 
epitope-containing peptide, comprising: 

25 (a) the high molecular weight protein or epitope- 

containing peptide as provided herein; 

(b) means for contacting the protein or peptide 
with the sample to produce complexes comprising the 
protein or peptide and any said antibodies present 

30 in the sample; and 

(c) means for determining production of the 
complexes • 

The invention also provides a diagnostic kit for 
detecting the presence, in a sample, of a high molecular 
35 weight protein of Haemophilus influenza e or epitope- 
containing peptide, comprising: 
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(a) an auitibody specific for the novel envelope 
protein as provided herein; 

(b) means for contacting the antibody with the 
sample to produce a complex comprising the protein 

5 or peptide and protein-specific antibody; and 

(c) means for determining production of th 
complex. 

In this application, the term "high molecular weight 
protein" is used to define a family of high molecular 
10 weight proteins of HaemQphj.Xus influenzae , generally 
having an apparent molecular weight of from about 120 to 
about 130 kDa and includes proteins having variations in 
their amino acid secjuences. In this application, a first 
protein or peptide is a "functional analog" of a second 
15 protein or peptide if the first protein or peptide is 
immunologically related to and/or has the same function 
as the second protein or peptide. The functional analog 
may be, for example, a fragment of the protein or a 
.^substitution, addition or deletion mutant thereof. The 
20 invention— also -extends -to such functional analogs . 

Advantages of the present invention include: 

- an isolated and purified envelope high molecular 
weight protein of Haemophilus influenzae produced 
recombinantly to be devoid of non-high molecular weight 

2 5 proteins of Haemophilus influenzae or from natiir al 
sources as well as nucleic acid molecules encoding the 
same; 

- high molecular weight protein specific h\iman 
monoclonal antibodies which recognize conserved epitopes 

30 in such protein; and 

- diagnostic kits and immunological reagents for 
specific identification of hosts infected by Haemophilus 

j,n;g3.yensfte. 
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BRIEF DESCRIPTION OF DRAWTNGS 

Figures lA to IG contain the DNA sequence of a gene 
coding for protein HMWl (SEQ ZD No: 1) . The hmwlA open 
reading frame extends from nucleotides 351 to 4958; 
5 Figures 2A and 2B contain the derived amino acid 

sequence of protein HMWl (SEQ ID No: 2} ; 

Figures 3A to 3G contain the DNA sequence of a gene 
coding for protein HMW2 (SEQ ID No: 3). The open hmw2A 
open reading frame extends from nucleotides 382 to 4782; 
10 Figures 4A and 4B contain the derived amino acid 

sequence of HMW2 (SEQ ID No: 4) ; 

Figure 5A shows restriction maps of representative 
recombinant phages which contained the HMWl or HMW2 
structiiral genes and of HMWl plasmid subclones. The 
15 shaded boxes indicate the location of the structural 
genes. In the recombinant phage, transcription proceeds 
from left to right for the HMWl gene and from right to 
left for the HMW2 gene; 

Figure SB shows . the restriction map of the T7 
20 expression .vector pT7.-r_7 .:— This vector -contains the -T7> RNA 
polymerase promoter 4>10, a ribosomal binding site (rbs) 
and the translational start site for the T7 gene 10 
protein upstream from a multiple cloning site; 

Figures 6A to 6L contain the DNA sequence of a gene 
25 cluster for the ftmw], gene (SEQ ID NO: 5) / comprising 
nucleotides 351 to 4958 (ORF a) (as in Figure 1) , as well 
as two additional downstream genes in the 3 ' f lemking 
region, comprising ORFs fe, nucleotides 5114 to 6748 and 
c nucleotides 7062 to 9011; 
30 Pig\ires 7A to 7L contain the DNA sequence of a gene 

cluster for the Iia5w2 gene (SEQ ID NO: 6) , comprising 
nucleotides 792 to 5222 (ORF fi) (as in Figure 3) , as well 
as two additional downstream genes in the 3' flanking 
region, comprising ORFs fc, nucleotides 5375 to 7009, and 
35 £, nucleotides 7249 to 9198; 
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Figures 8A and 8B contiain the DNA seqpjence of a gene 
coding for protein HMW3 (SEQ ID NO: 7) ; 

Figures 9A and 9B contain the DHA sec^uence of a gene 
coding for protein HMW4 (SEQ ID NO: 8) ; 
5 Figures lOA to lOL contain a comparison table for 

the derived amino acid sequence for proteins HKWl (SEQ ID 
No: 2), HMW2 (SEQ ID No: 4), HMW3 (SEQ ID No: 9) and HMW4 
(SEQ ID No: 10) ; 

Figure 11 illustrates a Western immunoblot assay of 

10 phage lysates containing either the HHWl or iQiW2 
recombinant proteins. Lysates were probed with an E> 
CO 1 i - absorbed adult serum sample with high-titer antibody 
against high molecular weight proteins. The arrows 
indicate the major immunoreactive bands of 125 and 120 

15 kDa in the HMWl and HMW2 lysates respectively; 

Figure 12 is a Western immunoblot assay of cell 
sonicates prepared from E, coli transformed with plasmid 
pT7-7 (lanes 1 and 2), pHMWl-2 (lanes 3 and 4) , pHMWl-4 
(lanes 5 and 6) or pHMWl-14 (lanes 7 auid 8) The 

20 ^^^^ohicates were "probed with an E.- coli -absorbed adult 
serum sample with high-titer antibody against high- 
molecular weight proteins. Lanes labelled U and I 
sequence sonicates prepared before and after Indication 
of the growing saonples with IPTG, respectively. The 

25 arrows indicate protein bauids of interest as discussed 
below; 

Figure 13 is a graphical illustration of an ELISA 
with rHMWl antiserum assayed against purified filamentous 
haemagglutinin of B. pertussis . Ab « antibody; 

30 Figure 14 is a Western immunoblot assay of cell 

sonicates from a panel of epidemiologically unrelated 
non-typeable H. influenzae strains. The sonicated were 
probed with rabbit antiserum prepared against HMWl-4 
recombinant protein. The strain designations are 

35 indicated by the numbers below each line; 
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Figure 15 is a Western inuaunoblot assay of cell 
sonicates from a panel of epidemiologically tinrelated 
non-typeable H. influenzae strains. The sonicates were 
probed with monoclonal antibody X3C, a murine igG 
5 antibody which recognizes the filamentous hemagglutinin 
of p. pertussis « The strain designations are indicated 
by the numbers below each line; 

Figure 16 shows an immunoblot assay of cell 
sonicates of non-typeable H. inflii^»>«o strain 12 
10 derivatives. The sonicates were probed with rabbit 
antiserum prepared against HMW-i recombinant protein. 
Lanes: 1, wild-type strain; 2, HMW2- mutant; 3, HMWl' 
mutauit; 4. HMWr HMW2- double mutant; 

Figure 17 shows middle ear bacterial counts in PBS- 
15 immunized control animals (left panel) and HMW1/HMW2- 
immunized animals (right panel) seven days after middle 
ear inoculation with non-typeable Haemonhji»« influenzae 
^ strain 12. Data are log-transformed and the horizontal 
_lanes indicate the means and standard deviations of 
20 - middle -ear-fluid bacterial- counts f or -only-the infected 
animals in each group; 

Figure 18 is a schematic diagram of pGEMEXg' -hmwi 
recombinant plasmids. The restriction enzymes are B- 
BamHl, E-EcoRl, c-CjLal, RV-EcoRV, Bst-fistEII and H- 
25 Hind lll ; 

Figvire 19 is a schematic diagram of pGEMEXO -hmw2 
recombinant plasmids. The restriction enzymes are E- 
ISORI, H-Hindlll, Hc-Hinfill, M-MliJl and X-Slfil; 

Figure 20 is an immunoelectron micrograph of 
30 representative non-typeable Haemophiin« influ^n^.^^ 
strains after incxibation with' monoclonal antibody AD6 
followed by incubation with goat anti-mouse IgG 
conjugated with 10-nm colloidal gold particles. Strains 
are: upper left panel-strain 12; upper right panel-strain 
12 mutant deficient in expression of the high molecular 
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weight proteins; lower left panel-strain 5; lower right 
panel-strain 15 ; 

Figxire 21 is a Western immunoblot assay with Mab AD6 
and HMWl or HMW2 recombinant proteins. The upper left 
5 panel indicates the segments of hmwlA or hinw2A structural 

genes which are being expressed in the recombinant 
proteins. The lane numbers correspond to the indicated 
segments ; 

Figure 22 is a Western immunoblot assay with MAb 

10 10C5 and HHWl or HMW2 recombinant proteins* The upper 

panel indicates the segments of the hmwlA or hmw2A 
structural genes which are being expressed in the 
recombinant proteins* The lane ntimbers correspond to the 
indicated segments ; and 

15 Figure 23 is a Western immunoblot assay with MAb AD6 

and a panel of unrelated non-typeable Haemophilus 
influenzae strains which express HMWl/HMW-2 like protein. 
Cell sonicates were prepared from freshly grown samples 
of each strain prior to analysis in the Western blot. 

20 ~- 

GENERAL DESCRIPTIO N OF INVENTION 
The DNA sequences of the genes coding for the HMWl 
and HMW2 proteins of non-typeable Haemophilus influenzae 
strain 12, shown in Figiires 1 and 3 respectively, were 

25 shown to be about 80% identical, with the first 1259 base 
pairs of the genes being identical. The open reading 
frame extend from nucleotides 351 1:o 4958 atnd from 
nucleotide 382 to 4782 respectively. The derived amino 
acid sequences of the two HMW proteins, shown in Figures 

30 2 and 4 respectively, are ed>out 70% identical* 
Furthermore, the encoded proteins are antigenically 
related to the filamentous hemagglutinin surface protein 
of Bordetella pertussis . A monoclonal antibody prepared 
against filamentous hemagglutinin (FHA) of pordetella 

35 pertussis was found to recognize both of the high 
moleculair weight proteins. This data suggests that the 
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HMW and FHA proteins may serve similar biological 
functions. The derived amino acid sequences of the HMWi 
and HMW2 proteins show sequence similarity to that for 
the FHA protein. It has further been shown that these 
antigenically-related proteins are produced by the 
majority of the non-typeable strains of Haemophilus . 
Antisera raised against the protein expressed by the HMWl 
gene recognizes both the HMW2 protein and the b. 
pertussis FHA. The present invention includes an 
isolated and pxirified high molecular weight protein of 
non-typeable haemophilus which is antigenically related 
^° B. pertussis FHA and which may be obtained from 

natural sources or produced recombinantly . 

A phage genomic library of a known strain of 
15 non-typeable Haemophilus was prepared by standard methods 
and the library was screened for clones expressing high 
molecular weight proteins, using a high titre antiserum 
against HMW's. A number of strongly reactive DMA clones 
were plaque-purified and sub-cloned into a T7 expression 
.20 — ^plasmid. lt-was=found that they-all~expressed either one 
or the other of the two high-molecular-weight proteins 
designated HMWl and HMW2, with apparent molecular weights 
of 125 and 120 kDa, respectively, encoded by open reading 
frames of 4.6 kb and 4.4 kb, respectively. 
25 Representative clones expressing either HMWl or HMW2 

were further characterized and the genes isolated, 
purified and sequenced. The DNA sequence of HMWl is 
shown in Figure l and the corresponding derived amino 
acid sequence in Figure 2. Similarly, the DNA sequence of 
30 HMW2 is shown in Figure 3 and the corresponding deriv d 
amino acid sequence in Figiire 4. Partial purification of 
the isolated proteins and N-terminal sequence analysis 
indicated that the eacpressed proteins are truncated since 
their sequence starts at residue number 442 of both full 
35 length HMWl and HHW2 gene products. 
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Subcloning studies with respect to the hmvi and hraw2 
genes' indicated that correct processing of tlie HMW 
proteins required the products of additional downstream 
genes. It has been found that both the hmwl and hmw2 
5 genes are flanked by two additional downstream open 
reading frames (ORFs) , designated and c, respectively, 
(see Figures 6 and 7) . 

The ORFs are 163 5 bp in length, extending from 
nucleotides 5114 to 6748 in the case of hmwl and 
10 nucleotides 5375 to 7009 in the case of hmw2 . with their 
derived eunino acid sequences being 99% identical- The 
derived eunino acid sequences demonstrate similarity with 
the derived amino acid sequences of two genes which 
encode proteins required for secretion and activation of 
15 hemolysins of P. mirabilis and S. marcesc^nft . 

The c ORFs are 1950 bp in length, extending from 
nucleotides 7062 to 9011 in the case of hmwl and 
nucleotides 7249 to 9198 in the case of hmw2 . with their 
derived amino acid sequences 96% identical • The hmwl s. 
20 ORF is preceded by a series of 9 bp direct tandem 
repeats. In plasmid subclones, interruption of the hmwl 
b or c ORF results in defective processing and secr-etion 
of the hmwl structural gene product. 

The two high molecular weight proteins HHWl and HMV72 
25 have been isolated and purified by the procedures 
described below in the Examples and shown to be 
protective against otitis media in chinchillas cmd to 
function as adhesins. These results indicate the 
potential for use of such high molecular proteins and 
30 structurally-related proteins of other non-typeable 
strains of Haemophilus influenzae as components in 
immunogenic compositions for protecting a susceptible 
host, such as a human infant, against disease caused by 
infection with non-typeable Haemophilus influenzae . 
35 Since the proteins provided herein are good 

cross-reactive antigens and are present in the majority 
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Of non-typeabl Haemophilus strains, it is evident that 
these HMW proteins may become integral constituents of a 
universal Haemophilus vaccine. Indeed, these proteins 
may be used not only as protective antigens against 
5 otitis, sinusitis and bronchitis caused t>y the 

non-typeable Haemophilus strains, but also may be used as 
carriers for the protective Hib polysaccharides in a 
conjugate vaccine against meningitis. The proteins also 
may be used as carriers for other antigens, haptens and 
10 polysaccharides from other organisms, so as to induce 
immunity to such antigens, haptens and polysaccharides. 

The nucleotide sequences encoding two high molecular 
weight proteins of a different non-typeable Haemophilus 
strain (designated HMW3 and HMW4), namely strain 5 have 
15 been elucidated, and are presented in Figures 8 and 9 
(SEQ ID Nos: 7 and 8). HMW3 has an apparent molecular 
weight of 125 kDa while HMW4 has an apparent molecular 
weight of 123 kDa. These high molecular weight proteins 
^re antigenically related to the HMWl and HMW2 proteins 

20 and^:. to^„FHA.„ - Figiire-- 10 - contains ^^ a^^^^^^ -sequence 

comparison of the derived euaino acid sequences for the 
four high molecular weight proteins identified herein 
(HMWl, SEQ ID No: 2; HMW2 , SEQ ID No: 4; HMW3 , SEQ ID No: 
9; HMW4, SEQ ID No, 10), As may be seen from this 
25 comparison, stretches of identical amino acid sequence 
may be found throughout the length of the comparison, 
with HMW3 more closely resembling HMWl and HMW4 more 
closely resembling HMW2. This information is highly 
suggestive of a considerable sequence homology between 
30 high molecular weight proteins from various non-typeable 
HagffophiJ-ys strains. This information is also suggestive 
that the HMW3 and HMW4 proteins will have the same 
immunological properties as the HMWl and HMW2 proteins 
and that corresponding HMW proteins from other non- 
35 typeable HaeyapphjUus strains will have the same 
immunological properties as the HMWl and HMW2 proteins. 
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In addition^ mutants of non-typcable H. influenz^ ^ 
strains that are deficient in expression of HMWl or HHW2 
or both have been constructed and ex2uziined for their 
capacity to adhere to cultured human epithelial cells. 
5 The liffiil and hinw2 gene clusters have been expressed in E. 
coli and have been exsunined for XR vitro adherence. The 
results of such experimentation, described below, 
demonstrate that both HMWl and HMW2 mediate attachment 
and hence are adhesins and that tbis function is present 

10 even in the absence of other H> influenzae surface 

structures. The ability of a bacterial surface protein 
to function as an adhesin provides strong in vitro 
evidence for its potential role as a protective antigen. 
In view of the considerable sequence homology between the 

15 HMW3 and HMW4 proteins and the HMWl and HMW2 proteins, 
these results indicate that HMW3 and HMW4 also are likely 
to function as adhesins and that other HMW proteins of 
other strains of non-typeable Haemophilus influenzae 
_simJJLarly_ are .likely to _ function as adhesins. This 

20 expectation is borne out by the results described in the 

Examples below. 

With the isolation and pxirif ication of the high 
molecular weight proteins, the inventor is able to 
determine the major protective epitopes of the proteins 

25 by conventional epitope mapping euid synthesizing peptides 
corresponding to these determinants for incorporation 
into fully synthetic or recombinant vaccines. 
Accordingly, the invention also comprises a synthetic 
peptide having at least six and no more than 150 euaino 

30 acids and having an amino acid sequence corresponding to 
at least one protective epitope of a high molecular 
weight protein of a non-typeable Haemophilus influenzae , 
such peptides are of varying length that constitute 
portions of the high molecular weight proteins, that can 

35 be used to induce immxinity, either directly or as part of 
a conjugate, against the respective organisms and thus 
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constitute active components of immunogenic compositions 
for protection against the corresponding diseases. 

In particular, the applicant has sought to identify 
regions of the high molecular weight proteins which are 
demonstrated experimentally to be surface-exposed B-cell 
epitopes and which are common to all or at least a large 
nximber of non-typeable strains of Haemophilus influenzae . 
The strategy which has been adopted by the inventor has 
been to: 

(a) generate a panel of monoclonal antibodies 
reactive with the high molecular weight proteins ; 

(b) screen those monoclonal antibodies for 
reactivity with surface epitopes of intact bacteria 
using immunoelectron microscopy or other suitable 
screening technicpie; 

(c) map the epitopes recognized by the monoclonal 
antibody by determining the reactivity of the 
monoclonals with a panel of recombinant fusion 
proteins; and . 

- (d) determining - the reactivity- of the monoclonal 

antibodies with heterologous non-typable Haemophi lug 
ingAuenzae strains using standard Western blot 
assays. 

Using this approach, the inventor has identified one 

monoclonal antibody, designated AD6 (ATCC ) , which 

recognized a surface-exposed B-cell epitope common to all 
non-typeable H. influeny.p^ which express the HMWl and 
HMW2 proteins. The epitope recognized by this antibody 
was mapped to a 75 amino acid sequence at the carboxy 
termini of both HMWl and HMW2 proteins. The ability to 
identify shared surface-exposed epitopes on the high 
molecular weight adhesion proteins suggests that it w uld 
be possible to develop recombinMt or synthetic peptide 
based vaccines which would be protective against disease 
caused by the majority of non-typeable Haemophllug 
influenzae. 
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The present invention also provides any variant or 
fragment of the proteins that retains the potential 
immunological ability to protect against disease caused 
by non-typeable Haemophilus strains. The vari€mts may be 
5 constructed by partial deletions or mutations of the 

genes and expression of the resulting modified genes to 
give the protein variants. 

It is clearly apparent to one skilled in the art, 
that the various embodiments of the present invention 
10 have many applications in the fields of vaccination, 
diagnosis, treatment of bacterial infections and the 
generation of immunological reagents. A further non- 
limiting discussion of such uses is further presented 
below. 

15 X. Vaccine Preparation and Use 

Immunogenic compositions, suitable to be used as 
vaccines, may be prepared from the high mo 1 ecu leqr weight 
proteins of Haemophilus influenzae . as well as analogs 
and fragments thereof, and synthetic peptides <:ontaining 
20 epitopes of the protein, as disclosed herein. The 

immunogenic composition elicits an immune response which 
produces antibodies , including anti-high molecular weight 
protein antibodies, and antibodies that are opsonizing or 
bactericidal. 

25 Immunogenic compositions, including vaccines, maybe 

prepared as inject2JDle6, as liquid solutions or 
emulsions. The active component may be mixed with 
pharmaceutically acceptable excipients which air 
compatible therewith. Such excipients may include, 

30 water, saline, dextrose, glycerol, ethemol, and 
combinations thereof. The imaii\inogenic compositions and 
vaccines may fuirther contain axixiliary substances, such 
as wetting or emulsifying agents, pH buffering agents, or 
adjuvants to enhance the effectiveness thereof. 

35 Immunogenic compositions and vaccines may be administered 
parenterally , by injection subcutaneous ly or 
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intramuscularly. Alternatively, the immunogenic 

compositions formed according to the present invention, 
may be formulated and delivered in a manner to evoke an 
immune response at mucosal surfaces* Thus, the 

5 immunogenic composition may be administered to mucosal 

surfaces by, for exsunple, the nasal or oral 
(intragastric) routes. Alternatively, other modes of 
administration including suppositories and oral 
formulations may be desirable. For suppositories, 
10 binders and carriers may include, for example, 
polyalkalene glycols or triglycerides. Oral formulations 
may include normally employed incipients such as, for 
example, pharmaceutical grades of saccharine, cellulose 
and magnesium carbonate. These compositions can taJce the 
15 form of solutions, suspensions, tablets, pills, capsul s, 
sustained release formulations or powders and contain 
about 1 to 95% of the active component. The immunogenic 
preparations and vaccines are administered in a manner 
compatible with the dosage formulation, and in such 
20 ™-^amo\int as will be therapeutically effective, protective 
and immunogenic. The quantity to be administered depends 
on the subject to be treated, including, for example, the 
capacity of the individual's immune system to synthesize 
antibodies, and if needed, to produce a cell-mediated 
25 immune response. Precise amounts of active ingredient 
required to be administered depend on the judgment of the 
practitioner. Howevetr, suitable dosage ranges fiire 
readily determinable by one skilled in the art and may be 
of the order of micrograms of the HMW proteins. Suitable 
30 regimes for initial administration and booster doses are 
also variable, but may include an initial administration 
followed by subsequent administrations. The dosage may 
also depend on the route of administration and will vary 
according to the size of the host. 
35 The concentration of the active component in an 

immunogenic composition according to the invention is in 
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general about l to 95%, A vaccine which contains 
antigenic material of only one pathogen is a monovalent 
vaccine. Vaccines which contain antigenic material of 
several pathogens are combined vaccines and also belong 
to the present invention • Such combined vaccines 
contain, for example, material from various pathogens or 
from various strains of the same pathogen, or from 
combinations of various pathogens. 

Immunogenicity can be significantly improved if the 
antigens eure co-administered with adjuvants, commonly 
used as 0.05 to 0.1 percent solution in phosphate- 
buffered saline. Adjuvants enhance the immxinogenicity of 
an antigen but are not necessarily immunogenic 
tJiemselves. Adjuvants may act by retaining the antigen 
15 locally near* the site of administration to produce a 
depot effect facilitating a slow, sustained release of 
antigen to cells of the immune system. Adjuvants can 
also attract cells of the immune system to an antigen 
depot and stimulate such cells to elicit immune 

20 responses-. 

Immunostimulatory agents or adjuvants have been used 
for many years to improve the host immune responses to, 
for example, vaccines. Intrinsic adjuvants, such as 
lipopolysaccharides, normally are. the components of the 
25 killed or attenuated bacteria used as vaccines* 

Extrinsic adjuvants are immunomodulators which are 
typically non-covalently linked to antigens and eure 
formulated to enhance the host immune responses. Thus, 
adjuvants have been identified that enhance the immune 
30 response to antigens delivered parenterally . Some of 
these adjuvants are toxic, however, and can cause 
undesirable side-effects, meOcing them unsuitable for use 
In humans and memy animals. Indeed, only aluminum 
hydroxide and aluminum phosphate (collectively commonly 
35 referred to as alum) are routinely used as adjuvants in 

human and veterinary vaccines. The efficacy of alxim in 
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increasing antibody responses to diphtheria and tetanus 
toxoids is well established and a HBsAg vaccine has been 
adjuvanted with alum. While the usefulness of alum is 
well established for some applications, it has 
5 limitations. For example, alum is ineffective for 
influenza vaccination and inconsistently elicits a cell 
mediated immune response. ' The antibodies elicited by 
alum- adjuvanted antigens are mainly of the igGl isotype 
in the mouse, which may not be optimal for protection by 
10 some vaccinal agents. 

A wide range of extrinsic adjuvants can provoke 
potent immune responses to antigens. These include 
saponins complexed to membrane protein antigens (immune 
stimulating complexes), pluronic polymers with mineral 
15 oil, killed mycobacteria in mineral oil, Freund's 
complete adjuvant, bacterial products, such as muramyl 
dipeptide (MDP) and lipopolysaccharide (LPS), as well as 
lipid A, and liposomes. 

. JTp efficiently induce humoral immune responses (HIR) 
20 -and cell-mediated immunity (CMI) , immunogens are often 
emulsified in adjuvants. Many adjuvants are toxic, 
inducing granulomas, acute and chronic inflammations 
(Freund's complete adjuvant, FCA) , cytolysis (saponins 
and Pluronic polymers) and pyrogenicity, arthritis and 
25 anterior uveitis (LPS and MDP). Although FCA is an 
excellent adjuvant and widely used in research, it is not 
licensed for use in human or veterinary vaccines because 
of its toxicity. 

Desirable characteristics of ideal adjuvants 
30 include: 

(1) lack of toxicity; 

(2) ability to stimulate a long- lasting immune response; 

(3) simplicity of manufacture and stability in long-term 
storage ; 

35 (4) ability to elicit both CMI and HIR to antigens 
administered by various routes, if required; 
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(5 J synergy with other adjuvants; 

(6) capability of selectively interacting with 
populations of antigen presenting cells (APC) ; 

(7) ability to specifically elicit appropriate T„l or 
5 Th2 cell-specific ijnmune responses; and 

(8) ability to selectively increase appropriate antibody 
isotype levels (for example, IgA) against antigens. 

U.S. Patent No. 4,855,283 granted to Lockhoff et al 
on August 8, 1989 which is incorporated herein by 

10 reference thereto teaches glycolipid analogues including 

N-glycosylamides , N-glycosy lur eas and N- 
glycosylcarbamates, each of which is substituted in the 
sugar residue by an amino acid, as immuno-modulators or 
adjuvants. Thus , Lockhof f et al, (US Patent No- 

15 4,855,283 and ref . 29) reported that N-glycolipid analogs 
displaying structural simileurities to the naturally- 
occurring glycolipids, such as glycosphingolipids and 
glycoglycerolipids , are capable of eliciting strong 
immune responses in both herpes simplex virus vaccine and 

20 pseudorabies virus vaccine. Some glycolipids have been 

synthesized from long chain-alky lamines and fatty acids 
that are linked directly with the sugars through the 
anomeric carbon atom, to mimic the functions of the 
naturally occurring lipid residues. 

25 U.S. Patent No. 4,258,029 granted to Moloney, 

incorporated herein by reference thereto, teaches that 
octadecyl tyrosine hydrochloride (OTH) functioned as an 
adjuvant when complexed with tetanus toxoid and formalin 
inactivated type I, II and III poliomyelitis virus 

30 vaccine. Also, Nixon-George et al. (ref. 30), reported 
that octadecyl esters of aromatic amino acids complexed 
with a recombinant hepatitis B surf ace antigen, enhanced 
the host immune responses against hepatitis B viirus. 

Lipidation of synthetic peptides has also been used 

35 to increase their immunogenicity. Thus, Hiesmuller 1989, 
describes a peptide with a sequence homologous to a f oot- 
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and-inouth disease viral protein coupled to an adjuvant 
tr ipalmity l-s-glyceryl-cysteiny Isery Iser ine , being a 
synthetic analogue of the N-terminal part of the 
lipoprotein from Gram negative bacteria. Furthermore, 
5 Deres et al. 1989, reported in vivo priming of virus- 

specific cytotoxic T lymphocytes with synthetic 
lipopeptide vaccine which comprised of modified synthetic 
peptides derived from influenza virus nucleoprotein by 
linkage to a lipopeptide, N-palmityl-s- [2 , 3- 
10 bis (palmitylxy)-(2RS) -propyl- [R]-cysteine (TPC) • 
2 • Xmmunoassays 

The high molecular weight protein of Haemophilus 
influenzae of the present invention is useful as an 
immunogen for the generation of anti-protein antibodies, 
15 as an antigen in immunoassays including enzyme-linked 
immunosorbent assays (ELISA) , RIAs and other non-enzyme 
linked antibody binding assays or procedures known in the 
art for the detection of antibodies. In ELISA assays, 
the protein -is immobilized onto a selected surface, for 
— example, asurface capable of binding proteins, such as 
the wells of a polystyrene microtiter plate. After 
washing to remove incompletely adsorbed protein, a 
nonspecific protein, such as a solution of bovine serum 
albumin (BSA) that is known to be antigenically neutral 
25 with regard to the test sample, may be boxind to the 
selected stirface* This allows for blocking of 

nonspecific adsorption sites on the immobilizing surface 
and thus reduces the background caused by nonspecific 
bindings of antisera onto the surface. 

The immobilizing surface is then contacted with a 
sample, such as clinical or biological materials, to be 
tested in a msmner conducive to immune complex 
(antigen/antibody) formation. This may include diluting 
the sample with diluents, such as solutions of BSA, 
35 bovine gamma globulin (EGG) and/or phosphate buffered 
saline (PBS)/Tween. The sample is then allowed to 
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incubate for from about: 2 to 4 hoxirs, at temperatures 
such as of the order of about 25* to 37*C. Following 
incubation, the sample-contacted surface is washed to 
remove non-immunocomplexed material. The washing 

5 procedure may include washing with a solution, such as 

PBS/Tween or a borate buffer • Following formation of 
specific immunocomplexes between the test sample and the 
bound protein, and subsequent washing, the occiirrence, 
and even amount, of immunocomplex formation may be 

10 determined by subjecting the immunocomplex to a second 
antibody having specificity for the first antibody. If 
the test sample is of himan origin, the second antibody 
is an antibody having specificity for human 
immxinoglobulins and in general IgG. To provide detecting 

15 means, the second antibody may have an associated 
activity such as an enzymatic activity that will 
generate, for example, a colour development upon 
incubating with an appropriate chromogenic substrate. 
IlQuahtif ication ^ay therr "be achieved by measxiring" the 

20 degree of colour generation using, for exetmple, a visible 

spectra spectrophotometer . 

3. Use of Sequences as Hybridization Probes 

The nucleotide sequences of the present invention, 
comprising the sequences of the genes encoding the high 

25 molecular weight proteins of specific strains of non- 
typeable Haemop hilus influenzae . now allow for the 
identification and cloning of the genes from any species 
of non-typeable Haemophilus and other strains of non- 
typeable Haemophilus influenzae . 

30 The nucleotide sequences comprising the sequences of 

the genes of the present invention are useful for their 
ability to selectively form duplex molecules with 
complementary stretches of other genes of high moleculeu: 
weight proteins of non-typeable Haemophilus , Depending 

35 on the application, a vsiriety of hybridization conditions 
may be employed to achieve varying degrees of selectivity 
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of the probe toward the other genes . For a high degree 
of selectivity, relatively stringent conditions are used 
to form the duplexes, such as low salt and/ or high 
temperature conditions, such as provided by 0.02 M to 
0.15 M NaCl at temperatures of between about SO'c to 70*C. 
For some applications, less stringent hybridization 
conditions are required such as 0.15 M to 0.9 M salt, at 
temperatures ranging from between about 20*C to 55*C. 
Hybridization conditions can also be rendered more 
stringent by the addition of increasing amounts of 
formamide, to destabilize the hybrid duplex. Thus, 
particular hybridization conditions can be readily 
manipulated, and will generally be a method of choice 
depending on the desired results. In general, convenient 
15 hybridization temperatures in the presence of 50% 
formamide are: 42*C for a probe which is 95 to 100% 
homologous to the target fragment, 37*C for 90 to 95% 
homology and 32'C for 85 to 90% homology. 

' In a clinical diagnostic embodiment, the nucleic 
-acid -sequences of the genes of the ~present~invention may 
be used in combination with an appropriate means, such as 
a label, for determining hybridization. A wide variety 
of appropriate indicator means are known in the art, 
including radioactive, enzymatic or other ligands, such 
as avidin/biotin, which are capable of providing a 
detectable signal. In some diagnostic embodiments, an 
enzyme tag such as urease, alkaline phosphatase or 
peroxidase, instead of a radioactive tag may be used. In 
the case of enzyme tags, colorimetric indicator 
substrates are known which can be employed to provide a 
means visible to the human eye or spectrophotometrically, 
to identify specific hybridization with samples 
containing gene sequences encoding high molecular weight 
proteins of non-typeaOale Haemop hilMg. 

The nucleic acid sequences of genes of the present 
invention are useful as hybridization probes in solution 
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hybridizations and in embodiments employing solid-phase 
procedures . in embodiments involving solid-phase 

procedures, the test DNA (or RNA) from samples, such as 
clinical samples, including exudates, body fluids (e. g. , 
5 serum, amniotic fluid, middle ear effusion, sputum, 

bronchoalveolar lavage fluid) or even tissues, is 
adsorbed or otherwise affixed to a selected matrix or 
surface. The fixed, single-stranded nucleic acid is then 
subjected to specific hybridization with selected probes 
10 comprising the nucleic acid sequences of the genes or 

fragments thereof of the present invention under desired 
conditions. The selected conditions will depend on the 
particular circumstances based on the particular criteria 
required depending on, for example, the G+C contents, 
15 type of target nucleic acid, source of nucleic acid, size 

of hybridization probe etc. Following washing of the 
hybridization surface so as to remove non-specif ically 
bound probe molecules, specific hybridization is 
"detected, or even quantified, by means of _the label. As 
20 withr the selection of peptides, it is preferred to select 

nucleic acid sequence portions which are conserved among 
species of non-typeable Haemophilus . The selected probe 
may be at least about 18 bp and may be in the range of 
about 3 0 bp to about 90 bp long. 
25 4. Bxpression of the High Molecular Weight Protein 

Genes 

Plasmid vectors containing replicon and control 
sequences which are derived from species compatible with 
the host cell may be used for the expression of the genes 

30 encoding high molecular weight proteins of non-typeable 
Haemophilus in expression systems. The vector ordineurily 
cairries a replication site, as well as maxking secpaences 
which are capable of providing phenotypic selection in 
transformed cells. For example, E. coll may be 

35 transformed using pBR322 which contains genes for 
eoapicillin and tetracycline resistance and thus provides 
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easy means for identifying transformed cells. The pBR322 
plasmid, or other microbial plasmid or phage must also 
contain, or be modified to contain, promoters vhich can 
be used by the host cell for expression of its own 
5 proteins. 

In addition, phage vectors containing replicon and 
control sequences that are compatible with the host can 
be used as a transforming vector in connection with these 
hosts. For example, the phage in lambda GEM™-11 may be 
10 utilized in making recombinant phage vectors which can be 
used to transform host cells, such as E. coli LE392. 

Promoters commonly used in recombinant DNA 
construction include the /3-lact£aDase (penicillinase) and 
lactose promoter systems (Chang et al., 1978: Itakiira et 
15 al., 1977 Goeddel et al., 1979; Goeddel et al., 1980) euid 
other microbial promoters such as the T7 promoter system 
(U.S. Patent 4,952,496). Details concerning the 
nucleotide sequences of promoters are known, enabling a 
skilled worker to ligate them functionally with genes. 
20 The particular promoter used will generally be a matter 
of choice depending upon the desired results. Hosts that 
are appropriate for expression of the genes encoding the 
high molecular weight proteins, fragment analogs or 
variants thereof, include E. coli. Bacillus species, 
25 HaemophA3.us# fungi, yeast or the baculovirus expression 

system may be used. 

In accordance with this invention, it is preferred 
to make the high molecular weight proteins by recombinant 
methods, particularly since the naturally occurring high 
30 molecular weight protein as purified from a cultxire of a 
species of non-typeable Haemophilus may include trace 
amounts of toxic materials or other contaminants. This 
problem can be avoided by using recombinantly produced 
proteins in heterologous systems which can be isolated 
35 from the host in a manner to minimize comtaminants in the 
purified material. Particularly desirable hosts for 
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expression in this regard include Gram positive bacteria 
which do not have LPS and are, therefore, endotoxin free, 
such hosts include species of Bacillus and may be 
particularly useful for the production of non-pyrogenic 
5 high molecular weight protein, fragments or analogs 

thereof ♦ Furthermore, recombinant methods of production 
permit the manufacture of HMWl, HMW2 , HMW3 or HMW4 , and 
corresponding HMW proteins from other non-typeable 
paemophj.lus influenzae strains, or fragments thereof, 
10 separate from one another and devoid of non-HMW protein 

of non-typeable Haemophilus influenzae . 

Biological Deposits 

Certain hybridomas producing monoclonal antibodies 

15 specific for high molecular weight protein of Haemophilus 

infj.uen2fte according to aspects of the present invention 
that are described and referred to herein have been 
deposited with the American Type Culture Collection 
(ATCC) located at 12301 Parklawn Drive, Rockville, 

2 0 Mary land^USA , 20852, pur suan t_ t o the -B udapest Treaty„€uid 
prior to the filing of this application. Seuaples of the 
deposited hybridomas will become available to the public 
upon grant of a patent based upon this United States 
patent application. The invention described and claimed 

25 herein is not to be limited in scope by the hybridomas 
deposited, since the deposited embodiment is Intended 
only as an illustration of the invention. Any equivalent 
or similar hybridomas that produce similax or equivalent 
antibodies as described in this application eore within 

30 the scope of the invention. 

Deposit 8\immary 

Hybridomas ATCC Designation Date Deposited 

AD6 
35 IOCS 
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EXAMPIiES 

The above disclosure generally describes 1::he present 
invention. A more complete understanding can be obtained 
by reference to the following specific Examples. These 
5 Examples are described solely for purposes of 
illustration and are not intended to limit the scope of 
the invention. Changes in form and substitution of 
equivalents are contemplated as circumstances may suggest 
or render expedient. Although specific terms have been 

10 employed herein, such terms are intended in a descriptive 
sense and not for purposes of limitations. 

Methods of molecular genetics, protein biochemistry, 
and immunology used but not explicitly described in this 
disclosure and these Examples are amply reported in the 

15 scientific literature and are well within the ability of 
those skilled in the eurt. 
Example !♦ 

This Example describes the isolation of DNA encoding 
. HMWl and HMW2_ proteins, cloning and expression of such 

.20: proteins, and sequencing and sequence analysis of the DNA 

molecules encoding the HMWl and HMW2 proteins. 

Non-typeable H. influenza «^ strains 5 and 12 were 
isolated in pure culture from the middle ear fluid of 
children with acute otitis media. Chromosomal DNA from 

25 strain 12, providing genes encoding proteins HMWl and 
HMW2, was prepared by preparing sau3A partial restriction 
digests of chromosomal DNA and fractionating on sucros 
gradients. Fractions containing DNA fragments in the 9 
to 20 kbp range were pooled and a library was prepared by 

30 ligation into XEMBL3 arms. Ligation mixtures were 
packaged in vitro and plate-amplified in a P2 lysogen of 
E. coli LE392. 

For plasmid subcloning studies, DNA from a 
representative recombinant phage was subcloned into the 
35 T7 expression plasmid pT7-7, containing the T7 RNA 
polymerase promoter (tio, a ribosome-binding site and the 
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tx-anslational start site for the T7 gene lo protein 
upstream from a multiple cloning site (see Figure 5B) . 

DNA sequence analysis was performed by the dideoxy 
method and both strands of the HMWl gene and a single 
5 strand of the HHW2 gene were sequenced. 

Western immvmoblot analysis was perfon&ed to 
identify the recombinant proteins being produced by 
reactive phage clones (Figure 11) . Phage lysates grown 
in LE392 cells or plaques picked directly from a lawn of 

10 LE3 92 cells on YT plates were solubilized in gel 

electrophoresis sample buffer prior to electrophoresis. 
Sodium dodecyl sulfate polyacryleaaide gel electrophoresis 
(SDS-PAGE) was performed on 7.5% or 11% polyacrylamide 
modified La emmli gels. After transfer of the proteins to 

15 nitrocellulose sheets, the sheets were probed 

sequentially with an E. coli -absorbed human serum sample 
containing high-titer antibody to the high-molecular- 
weight proteins and then with alkaline phosphatase- 
conjugated goat anti-human immunoglobulin G (XgG) second 

20 antibody. Sera from healthy adults contains high-titer 
antibody directed against surface-exposed high-molecular- 
weight proteins of non-typeable H. influenzae . One such 
serum sample was used as the screening antiserum after 
having been extensively absorbed with IjE392 cells. 

25 To identify recombinant proteins being produced by 

E. coli transformed with recombinant plasmids, the 
plasmids of interest were used to transform E. coli BIj21 
(DE3)/pLysS. The transformed strains were grown to an 
A^ of 0.5 in L broth containing 50 fig of ampicillin per 

30 ml. IPTG was then added to 1 mM. One hour later, cells 
were harvested, and a sonicate of the cells was prepared. 
The protein concentrations of the S2uaples were determined 
by the bicinchoninic acid method. Cell sonicates 
containing 100 /ig of total protein were solubilized in 

35 electrophoresis sample buffer, subjected to SDS- 
polyacrylamide gel electrophoresis, and transferred to 
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nitrocellulose. The nitrocellulose was then probed 
seqjientially with the E. coli -absorbed adult serum sample 
and then with alkaline phosphatase-conjugated goat anti- 
human IgG second antibody • 
5 Western immunoblot analysis also was performed to 

determine whether homologous and heterologous non- 
typeable H. influenzae strains expressed high-molecular- 
weight proteins antigenically related to the protein 
encoded by the cloned HMWl gene (rHMWl) . Cell sonicates 
10 of bacterial cells were solubilized in electrophoresis 
sample buffer, subjected to SDS-polyacrylamide gel 

electrophoresis , and transferred to nitrocellulose. 

Nitrocellulose was probed sequentially with polyclonal 

rabbit rHMWl antiserum and then with alkaline 
15 phosphatase-conjugated goat anti-rabbit igG second 

antibody. 

Finally, Western immunoblot analysis was performed 
to determine whether non-typeable Haemophij yfi strains 
expressed proteins antigenically related to the 

20 filamentous_ hemagglutinin protein of — Bordetella 

pertussjLg. Monoclonal antibody X3C, a murine 

immunoglobulin G (IgG) antibody which recognizes 
filamentous hemagglutinin, was used to probe cell 
sonicates by Western blot. An alkaline phosphatase- 
25 conjugated goat anti-mouse IgG second antibody was used 
for detection. 

To generate recombinant protein antiserum, E. coli 
BL21(DE3)/pLysS was transformed with pHMWl-4, and 
expresision of recombinant protein was induced with IPTG, 
30 as described above. A cell sonicate of the bacterial 
cells was prepared and separated into a supernatant and 
pellet fraction by centrif ugation at 10,000 x g for 30 
min. The recombinant protein fractionated with the 
pellet fraction. A rabbit was subcutaneous ly immunized 
35 on biweekly schedule with 1 mg of protein from the pellet 
fraction, the first dose given with Freund's complete 
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adjuvant and subsequent doses with Freund's incomplete 
adjuvant. Following the fourth injection, the rabbit was 
bled. Prior to use in the Western blot assay, the 
antiserum was absorbed extensively with sonicates of the 

5 host coli strain transformed with cloning vector 

alone. 

To assess the sharing of antigenic determinants 
between HMWl and f ileu&entous hemagglutinin, enzyme-linked 
immunosorbent assay (ELISA) plates (Costar, Cambridge, 

10 Mass.) were coated with 60 /xl of a solution of 

filamentous hemagglutinin in Dulbecco's phosphate- 
buffered saline per well for 2 h at room temperature. 
Wells were blocked for 1 h with 1% bovine serxim albumin 
in Dulbecco's phosphate-buffered saline prior to addition 

15 of serum dilutions. rHMWl antiserum was serially diluted 
in 0.1% Brij (Sigma, St. Louis, Mo.) in Dulbecco's 
phosphate-buffered saline and incubated for 3 h at room 
temperature. After being washed, tAe plates were 
incubated with peroxidase-conjugated goat anti-rabbit IgG 

20 "antibody (Bib-Rad) f or^2 h at ro^ and sxxbse- 

quently developed with 2 , 2 ' -az ino-bis ( 3 - 
ethylbenzthiazoline-6-sulf onic acid) (Sigma) at a 
concentration of 0.54 in mg/ml in O.i M sodium citrate 
buffer^ pH 4.2, containing 0.03% HjO^. Absorbances were 

25 read on an automated ELISA reader. 

Recombinant phage expressing HMWl or HMW2 were 
recovered as follows. The non-typeable H. influenzae 
strain 12 genomic library was screened for clones 
expressing high-molecular-weight proteins with an ILr. 

30 cfili-absorbed htunan serum semple containing a high titer 
of antibodies directed against the high-molecular-weight 
proteins. 

Numerous strongly reactive clones were identified 
along with more weakly reactive ones. Twenty strongly 
35 reactive clones were plaque-purified and exeunined by 
Western blot for expression of recombinant proteins. 
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Each of the strongly reactive clones expressed one of two 
types of high-molecular-weight proteins, designated HMWi 
and HMW2. The major iaununoreactive protein beoids in the 
HMWi and HMW2 lysates migrated with apparent molecular 
5 masses of 125 and 120 JcDa, respectively. In addition to 
the major bands, each lysate contained minor protein 
bands of higher apparent molecular weight. Protein bands 
seen in the HMW2 lysates at molecular masses of less than 
120 JcDa were not regularly observed and presumably 
10 represent proteolytic degradation products. Lysates of 
LE392 infected with the XEMBL3 cloning vector alone were 
non-reactive when immunologically screened with the same 
serum sample. Thus, the observed activity was not due to 
cross-reactive E- Cfftj, proteins or XEMBL3 -encoded pro- 
15 teins. Furthermore, the recombinant proteins were not 
simply binding immunoglobulin nonspecif ically, since the 
proteins were not reactive with the goat anti-human IgG 
conjugate alone, with normal rabbit sera, or with serum 
from a nimber _pf_.healthy_yovmg. infants. _ 
20 _riRepresentative clones^: e the HMWI or 

HMW2 recombinant proteins were characterized further. 
The restriction maps of the two phage types were 
different from each other, including the regions encoding 
the HMWI and HMW2 structural genes. Figure 5A shows 
25 restriction maps of representative recombinant phage 
which contained the HMWI or HMW2 structural genes. The 
locations of the structural genes are indicated by the 
shaded bars. 

HMWI plasmid subclones were constructed by using the 
T7 expression plasmid T7- 7 (Fig. 5A and B) . HMW2 plasmid 
subclones also were constructed, and the results with 
these latter subclones were similar to those observed 
with the HMWI constructs. 

The approximate location and direction of 
transcription of the HMWi structure gene were initially 
determined by using plasmid pHMWl (Pig. SA) . This 
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plasmid was constructed by inserting the S.s-kb BatoHI- 
Sail fragment from Xhmwi into Bam HX- and S^ll-cut pT7-7. 

jEj coll transformed with pHMWi expressed an 

immunoreactive recombinant protein with an apparent 
5 molecular mass of 115 kDa, which was strongly inducible 

with IPTG. This protein was significantly smaller than 
the 125-kDa major protein expressed by the parent phage, 
indicating that it either was being expressed as a fusion 
protein or was truncated at the carboxy terminus . 

10 To more precisely localize the 3^ end of the 

structural gene, additional plasmids were constructed 
with progressive deletions from the 3' end of the pHMWl 
construct- Plasmid pHMWl-l was constructed by digestion 
of pHMWl with PstI, isolation of the resulting 8.8-kb 

15 fragment, and religation, Plasmid pHMWl-2 was 

constructed by digestion of pHMWl with Hindlll, isolation 
of the resulting 7.5-)cb fragment, and religation. E. 
coli transformed with either plasmid pHMWl-1 or pHMWl-2 
also expressed an imm uno reactive recombinant protein with 

20 an apparent molecular mass of 115 kDa. These results 
indicated that the 3 ' end of the structural gene was 5' 
of the iiindlll site. Figure 12 demonstrates the Western 
blot results with pHMWl-2 transformed cells before and 
after IPTG indicates (lanes 3 and 4, respectively). The 

25 115 kDa recombinant protein is indicated by the arrow. 

Transf ormants also demonstrated cross -reactive bands of 
lower apparent molecular weight, and probably represent 
partial degradation products. Shown for comparison and 
the results for E. coli transformed with the pT7-7 

30 cloning vector alone (Fig. 12, lanes 1 and 2). 

To more precisely localize the 5' end of the gene, 
plasmids pHMWl-4 and pHMWl-7 were constmxcted . Plasmid 
pHMWl-4 was constructed by cloning the 5.1-kb fiamHI- 
Hindlll fragment from XHMWI into a pT7 -7 -derived plasmid 

35 containing the upstream 3.8-kb Eco RI- Bam Ht fragment. E. 

coli transformed with pHMWl-4 expressed an immunoreactive 
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protein with an apparent molecular mass of approximately 
160 kDa (Fig. 12, lane 6). Although protein production 
was inducible with IPTG, the levels of protein production 
in these transf ormants were substantially lower than 
5 those with the pHMWl-2 transf ormants described above. 
Plasmid pHMWi-7 was constructed by digesting pHMWi-4 with 
USSX and Segl. The 9.0-kbp fragment generated by this 
double digestion was isolated, blunt ended, and 

religated. £^ coJLi transformed with pHMWl-7 also 

10 expressed an immunoreactive protein with an apparent 
molecular mass of 160 kDa, a protein identical in size to 
that expressed by the pHMWl-4 transf ormants. The result 
indicated that the initiation codon for the HMWl 
structural gene was 3' of the SesI site. DNA sequence 
15 analysis (described below) confirmed this conclusion. 

As noted above, the XHMWl phage clones expressed a 
major immunoreactive bemd of 125 kDa, whereas the HMWl 
plasmid clones pHMWi-4 and pHMWl-7, which contained what 
_ was believed -to- be- the full-length gene, expressed an 
20 — —^immunoreactive protein of approximately 160 kDaT This 
size discrepancy was disconcerting. one possible 
explanation was that an additional gene or genes 
necessary for correct processing of the HMWl gene product 
were deleted in th^ process of subcloning. To address 
25 this possibility, plasmid pHMWl-14 was constructed. This 
construct was generated by digesting pHMWl with Ndel and 
Mlul and inserting the 7.6-ia>p Mdel-Mlul fragment 
isolated from pHMWl-4. Such a construct would contain 
the full-length HMWl gene as well as the DNA 3' of the 
30 HMWl gene which was present in the original HMWl phage. 

fi- transformed with this plasmid expressed major 

immunoreactive proteins with apparent molecular masses of 
125 and 160 kDa as well as additional degradation 
products (Fig. 12, lanes 7 and 8). The 125- and 160-kDa 
35 bands were identical to the major and minor 
immunoreactive bands detected in the HMWl phage lysates. 
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Interestingly, the pHMWl-14 construct also expressed 
significant amounts of protein in the uninduced 
condition, a situation not observed with the earlier 
constructs . 

5 The relationship between the 125- and 160-kDa 

proteins remains somewhat unclear. Secfuence analysis, 
described below, reveals that the HMWl gene would be 
predicted to encode a protein of 159 kDa, It is believed 
that the 160-kDa protein is a precursor form of the 

10 mature 125-)cDa protein, with the conversion from one 

protein to the other being dependent on the products of 
the two downstream genes* 

Sequence analysis of the HMWl gene (Figure 1) 
revealed a 4,608-bp open reading fraune (ORF) , beginning 

15 with an ATG codon at nucleotide 351 and ending with a TAG 
stop codon at nucleotide 4959* A putative ribosome- 
binding site with the sequence AGGAG begins 10 bp up- 
stream of the putative initiation codon* Five other in- 
fraune ATG codons are located within 250 bp of the 

20 beginning of~"the~6RF7^ut none" of these^is preceded by a 
typical ribosome-binding site* The 5'-f leuiking region of 
the ORF contains a series of direct tandem repeats, with 
the 7-bp sequence ATCTTTC repeated 16 times* These 
tandem repeats stop 100 bp 5' of the putative initiation 

25 codon. An 8 -bp inverted repeat characteristic of a rho- 
independent transcriptional terminator Is present, 
beginning at nucleotide 4983, 25 bp 3' of the presumed 
translational stop* Multiple termination codons are 
present in all three reading frames both upstream and 

30 downstreem of the ORF* The derived amino acid sequence 
of the protein encoded by the HMWl gene (Figure 2) has a 
molecular weight of 159,000, in good agreement with tiie 
apparent molecular weights of the proteins expressed by 
the HMWl-4 and HMWl-7 transf ormants* The derived sunino 

35 acid sequence of the amino terminus does not demonstrate 
the characteristics of a typical signal sequence* The 
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fiainHI site used in generation of pHMWi comprises bp 1743 
through 1748 of the nucleotide sequence. The ORF 
downstream of the BamHI site would be predicted to encode 
a protein of ill kDa, in good agreement with the 115 kDa 
5 estimated for the apparent molecular mass of the pHMWl- 
encoded fusion protein. 

The sequence of the HMW2 gene (Figure 3) consists of 
a 4,431-bp ORF, beginning with an ATG codon at nucleotide 
352 and ending with a TAG stop codon at nucleotide 4783, 
10 The first 1,259 bp of the ORF of the HMW2 gene are 
identical to those of the HMWi gene. Thereafter, the 
sequences begin to diverge but are 80% identical overall. 
With the exception of a single base addition at 
nucleotide 93 of the HMW2 sequence, the 5 '-flanking 
15 regions of the HMWI and HMW2 genes are identical for 310 
bp upstream from the respective initiation codons. Thus, 
the HMW2 gene is preceded by the same set of tandem 
repeats and the same putative ribosome-binding site which 

™Wi gene. A putative transcriptional 
20 terminator identical to that identified 3' of the HMWi 
ORF is noted, beginning at nucleotide 4804. The 
discrepancy in the lengths of the two genes is 
principally accounted for by a i86-bp gap in the HMW2 
sequence, beginning at nucleotide position 3839. The 
25 derived amino acid sequence of the protein encoded by the 
HMW2 gene (Figure 4) has a molecular weight of 155,000 
and is 71% identical with the derived amino acid sequence 
of the HMWl gene. 

The derived amino acid sequences of both the HMWl 
30 and HMW2 genes (Figures 2 and 4) demonstrated sequence 
similarity with the derived amino acid sequence of 
filamentous hemagglutinin of Bordetelia pei-t»i««ip ^ a 
surface-associated protein of this organism. The initial 
and optimized TFASTA scores for the HMWl-f ilamentous 
35 hemagglutinin sequence comparison were 87 and 186, 
respectively, with a word size of 2. The z score for the 
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comparison was 45, 8. The initial and optiimized TFASTA 
scores for the HMW2 -filamentous hemagglutinin sequence 
comparison were 68 and 196, respectively* The z score 
for the latter comparison was 48 •T. The magnitudes of 
5 the initial and optimized TFASTA scores and the z scores 

suggested that a biologically significant relationship 
existed between the HMWl and HMW2 gene products and 
filamentous hemagglutinin. When the derived amino acid 
sequences of HMWl, HMW2, and fileunentous hemagglutinin 
10 genes were aligned and compared, the similarities were 

most notable at the amino-terminal ends of the three 
secjuences. Twelve of the first 22 amino acids in the 
predicted peptide secpaences were identical- In addition, 
the sequences demonstrated a common f ive-amino-acid 
15 stretch, Asn-Pro-Asn-Gly-Ile, and several shorter 

stretches of sequence identity within the first 2 GO amino 
acids. 
Example 2 : 

This Example describes the relationship of 

20 f ileonentous hemagglutinin and the HMWl protein. 

To further explore the HMWi-f il2anent us 
hemagglutinin relationship, the ability of ajitisemim 
prepared against the HMWl-4 recombinant protein (rHMWl) 
to recognize purified filamentous hemagglutinin was 

25 assessed (Figure 13). The rHMWl antiserxam demonstrated. 

EIjISA reactivity with filamentous hemagglutinin in a 
dose-dependent manner. Preimmune rabbit serum had 
minimal reactivity in this assay. The rHMWl antisez^un 
also was examined in a Western blot assay and 

30 demonstrated weak but positive reactivity with purified 
filamentous hemagglutinin in this system also. 

To identify the native Haemophilus protein 
corresponding to the HMWl gene product and to determine 
the extent to which proteins antigenically related to the 

35 HMWl cloned gene product were common eunong other non- 
typeable H. influenzae strains, a panel of Haemophilus 
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Strains was screened by Western blot with the rHMWl 
antiserxun. The antiserum recognized both a 125- and a 
120-kDa protein band in the homologous strain 12 (Figure 
14) , the putative matuire protein products of the HMWl and 
5 HMW2 genes, respectively. The 120-kDa protein appears as 

a single band in Figure 14, wherein it appeared as a 
doublet in the HMW2 phage lysates (Figure 11) . 

When used to screen heterologous non-typeable H. 
inf 3-u^ngae strains, rHMWl antiserum recognized high- 
10 molecular-weight proteins in 75* of 125 epidemic logically 
unrelated strains. In general, the antiserum reacted 
with one or two protein bands in the lOO- to 150-kDa 
range in each of the heterologous strains in a pattern 
similar but not identical to that seen in the homologous 
15 strain (Figure 14) . 

Monoclonal antibody X3C is a murine IgG antibody 
directed against the filamentous hemagglutinin protein of 
B. pertussis. This antibody can inhibit the binding of 
— pert;ussls cells to Chinese^hamster ovary cells and 
20 "^H€ilia-~cells~in -cultxxre- and will-i hemagglutination 
of erythrocytes by purified filamentous hemagglutinin. 
A Western blot assay was performed in which this 
monoclonal antibody was screened against the same panel 

of non-typeable ^ influenzae strains discussed above 

25 (Figure 14) . Monoclonal antibody X3C recognized both the 

high-molecular-weight proteins in non-typeeible H. 
ipf3.uengfie strain 12 which were recognized by the 
recombinant-protein antiserum (Figxire 15). In addition, 
the monoclonal antibody recognized protein bands in a 
30 subset of heterologous non-typeable H. influenzae strains 
which were identical to those recognized by the 
recombinant-protein antiserxim, as may be seen by 
comparison of Figxires 14 and 15* On occasion, the 
filamentous hemagglutinin monoclonal antibody appeared to 
J 5 recognize only one of the two bands which had been 
recognized by the recombinant-protein antiserum (compare 
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Strain lane 18 in Figures 14 and 15, for example) 
Overall, monoclonal antibody X3C recognized high- 
molecular-weight protein bands identical to those 
recognized by the rHMWl antiservua in approximately 35% of 
5 our collection of non-typeable H. influenzae strains • 

Example 3 ; 

This Example describes the adhesin properties of the 
HMWl and HMW2 proteins. 

Mutants deficient in expression of HMWl, HMW2 or 

10 both proteins were constructed to exaunine the role of 

these proteins in bacterial adherence. The following 
strategy was employed. pHMWl-14 (see Example 1, Figure 
5A) was digested with Bam HI and then ligated to a 
kanamyciri cassette isolated on a 1.3-kb BamHI fragment 

15 from pUC4K, The resultant plasmid (pHMWl-17) was 

linearized by digestion with Xba l and transformed into 
non-typeable H. influenzae strain 12^ followed by 
selection for kanamycin resistant colonies. Southern 
-analysis of a series of these colonies demonstrated two 

20 populations of transformants, one with an insexrtion in 
the HMWl structural gene and the other with an insertion 
in the HMW2 . structural gene. One mutant from each of 
these classes was selected for further studies. 

Mutants deficient in expression of both proteins 

25 were recovered using the following protocol. After 
deletion of the 2.1-kb fragment of DMA between two Eco RI 
sites spanning the 3 '-portion of the HMWl structural gen 
and the 5 '-portion of a downstreeun gene encoding an 
accessory processing protein in pHMW-i5, the kanaimycin 
.30 cassette from pUC4K was inserted as a 1.3-kb EcoRl 
fragment. The resulting plasmid (pHMWl-16) was 

linearized by digestion with Xba l and transformed into 
strain 12, followed again by selection for kanamycin 
resistant colonies. Southern analysis of a 

35 representative sampling of these colonies demonstrated 
that in seven of eight cases, insertion into both the 
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10 



15 



mmi and HMW2 loci had occurred. one such mutant was 
selected for further studies. 

To confirm the intended phenotypes, the mutant 
strains were examined by Western blot analysis with a 
polyclonal antiserum against recombinant HMWl protein. 
The parental strain expressed both the 125-kD HMWl and 
the 120-kD HMW2 protein (Figure 16). m contrast, the 
HMW2- mutant failed to express the 120-kD protein, and th 
HMWl mutant failed to express the 125-kD protein. The 
double mutant lacked expression of either protein. On 
the basis of whole cell lysates, outer membrane profiles, 
and colony morphology, the wild type strain and the 
mutants were otherwise identical with one another. 
Transmission electron microscopy demonstrated that none 
of the four strains expressed pili. 

The capacity of wild type strain 12 to adhere to 
Chang epithelial cells was examined. In such assays, 
bacteria were inoculated into broth and allowed to grow 
to a density, of ^.2- X 10» cfu/ml. Approximately _2 x lo' 
cf u were -inoculated onto epithelial cell monolayers , and 
plates were gently centrifuged at 165 x g f or 5 minutes 
to facilitate contact between bacteria and the epithelial 
surface. After incubation for 3 0 minutes at 37»C in 5% 
COj, monolayers were rinsed 5 times with PBS to remove 
nonadherent organisms and were treated with trypsin-EDTA 
(0.05% trypsin, 0.5% EDTA) in PBS to release them from 
the plastic support. Well contents were agitated, and 
dilutions were plated on solid medium to yield the number 
of adherent bacteria per monolayer. Percent adherence 
was calculated by dividing the number of adherent cf u per 
monolayer by the number of inoculated cfu. 

As depicted in Table i below (the Tables appear at 
the end of the descriptive text), this strain adhered 
quite efficiently, with nearly 90% of the inoculum 
35 binding to the monolayer. Adherence by the mutant 
expressing HMWl but not HMW2 (HMW2-) was also quite 
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efficient and compared^le to that by the wild type strain. 
In contrast, attachment by the strain expressing HMW2 but 
deficient in expression of HMWl (HMWl*) was decreased 
about 15-fold relative to the wild type. Adherence by 
5 the double mutant (HMW1VHMW2*) was decreased ev n 

further, approximately 50-fold compared with the wild 
type and approximately 3 -fold compared with the HMWl 
mutant « Considered together, these results suggest that 
both the HMWl protein and the, HMW2 protein influence 
10 attachment to Chang epithelial cells. Interestingly, 

optimal adherence to this cell line appears to require 
HMWl but not HMW2. 
Example 4 ; 

This Example illustrates the preparation and 
15 expression of HMW3 and HMW4 proteins and their function 

as adhesins. 

Using the plasmids pHMWl-16 and pHMWl-17 (see 
Example 3) and following a scheme similar to that 
employ^ed wi th st rain 12 a s described in EXMple_^ , three 

20 hbn-typeable Haemophilus strain 5 mutants were isolated, 

including one with the kanamycin gene inserted into the 
hmwl -like (designated hmw3 ) locus, a second with an 
insertion in the hmv2 -like (designated hmw4 ^ locus, and 
a third with insertions in both loci. As predicted, 

25 Western immunoblot analysis demonstrated that the mutant 
with insertion of the kanamycin cassette into the hmwl ^ 
like locus had lost expression of the HMW3 125-kD 
protein, while the mutant with insertion into the hmw2 - 
like locus failed to express the HMW4 123-kD protein. 

30 The mutant with a double insertion was unable to express 
either of the high molecular weight proteins. 

As shown in Table 1 below, wild type strain 5 
demonstrated high level adherence, with almost 80% of the 
inoculvun adhering per monolayer. Adherence by the mutant 

35 deficient in expression of the HMW2-like protein (i.e. 

HMW4 protein) was also quite high. In contrast. 
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adherence by the mutant: unable to express the HMWi-like 
protein (i.e. HMW3 protein) was reduced about 5-fold 
relative to the wild type, amd attachment by the double 
mutant was diminished even fvirther (approximately 25- 
5 fold) . Examination of Giemsa-stained saunples confirmed 

these observations (not shown) . Thus, the results with 
strain 5 for proteins HMW3 and HMW4 corroborate the 
findings with strain 12 and the HMWl and HMW2 proteins. 
Example 5 ; 

10 This Example contains additional data concerning th 

adhesin properties of the HMWl and HMW2 proteins. 

To confirm an adherence function for the HMWl and 
HMW2 proteins and to examine the effect of HMWl and HMW2 
independently of other H. influenzae surface structures, 
15 the hmwl and the hmw2 gene clusters were introduced into 

E- — SPli DHSa, using plasmids pHMWl-14 and pHMW2-21, 
respectively. As a control, the cloning vector, pT7-7, 
was also transformed into E. coli DHSa. Western blot 
_ demonstrated that E. coli DHSa containing the 

2 0 tmSaLi genes expressed a 125 kDa protein, while the same 

strain harboring the hmw2 genes expressed a 120-)cDa 
protein. p. cpj^j DH5a containing pT7-7 failed to react 
with antiserum against recombinant HMWl. Transmission 
electron microscopy revealed no pili or other surface 
25 appendages on any of the E. coli strains. 

Adherence by the E. coli strains was quantitated and 
compared with adherence by wild type non-typeable H. 
j-nflueyi^ae strain 12. As shown in Table 2 below, 
adherence by ^ — coH DH5a containing vector alone was 
30 less than 1% of that for strain 12. in contrast, e. coli 
DHSa harboring the hffiVJL gene cluster demonstrated 
adherence levels comparable to those for strain 12 . 
Adherence by E> coJLjL DH5a containing the hmw2 genes was 
approximately 6-fold lower than attachment by strain 12 
35 but was increased 20-fold over adherence by e. coli DH5a 
with PT7-7 alone. These results indicate that the HMWl 
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i^nd HMW2 proteins are capable of independently mediating 
*ttac"hinent to Chang conjunctival cells. These results 
are consistent with the results with the H. influrinTy o 
mutants reported in Examples 3 and 4, providing further 
5 evidence that, with Chang epithelial cells, HMWl is a 

more efficient adhesin than is HMH2. 

Experiments with E. coli HBlOl harboring pT7-7 , 
pHMWi-14, or PHMW2-21 confirmed the results obtained with 
the DH5a derivatives (see Table 2). 
10 Example fit 

This Example illustrates the copurif ication of HMWl 
and HMW2 proteins from wild-type non-typeable h. 
influenzae strain. 

HMWl and HMW2 were isolated and purified from non- 
15 typeable H. influenra^ (NTHI) strain 12 in the following 
manner . Non-typeable Haemophilus bacteria from frozen 
stock culture were streaked onto a chocolate plate and 
grown overnight at 37 'C in an incxibator with 5% COj. 
50ml starter culture of brain heart infusion (BHI) broth, 
20 supplemented ^ith~10~iug/ml^^S^ NAD was 

inoculated with growth on chocolate plate. The starter 
culture was grown until the optical density (O.D. - 
eoonm) reached 0.6 to 0.8 and then the bacteria in the 
starter culture was used to inoculate six 500 ml flasks 
25 of supplemented BHI using 8 to 10 ml per flask. The 
bacteria were grown in 500 ml flasks for am additional 5 
to 6 hotirs at which time the O.D. was 1.5 or great r. 
Cultures were centrifuged at 10,000 rpm for lO minutes. 
Bacterial pellets were resuspended in a total volume 
30 of 250 ml of an extraction solution comprising 0.5 M 
NaCl, 0.01 M NajEDTA, 0.01 M Tris 50 fOi 1,10- 
phenanthroline, pH 7.5. The cells were not sonicated or 
otherwise disrupted. The resuspended cells were allowed 
to sit on ice at 0»C for 60 minutes. The resuspended 
35 cells were centrifuged at 10,000 rpm for ID minutes at 
4"C to remove the majority of intact cells and cellular 
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debris. The supernatant was collected and centrifuged at 
100,000 X g for 60 minutes at 4«C. The supernatant again 
was collected and dialyzed overnight at 4«C against 0.01 
M sodium phosphate, pH 6.0. 
5 The sample was centrifuged at 10,000 rpm for 10 

minutes at 4**C to remove insoluble debris precipitated 
from solution during dialysis. The supernatant was 
applied to a lO ml CM Sepharose column which has been 
pre-equi libra ted with 0.01 M sodium phosphate, pH 6. 
10 Following application to this column, the column was 

washed with 0.01 M sodium phosphate. Proteins were 
elevated from the column with a 0 - 0.5M KCl gradient in 
0.01 M Na phosphate, pH 6 and fractions were collected 
for gel examination. Coomassie gels of column fractions 
15 were carried out to identify those fractions containing 

high molecular weight proteins. The fractions containing 
high molecular weight proteins were pooled euid 
concentrated to a 1 to 3 ml volume in preparation for 
_^PPtiCiation_of sample to gel filtration colximn. 

20 A — Sepharose CL-4B"^ gel - filtration - columin was 

equilibrated with phosphate-buffered saline, pH 7.5. The 
concentrated high molecular weight protein seuaple was 
applied to the gel filtration column and column fractions 
were collected. Coomassie gels were performed on the 
25 column fractions to identify those containing high 
molecular weight proteins. The column fractions 
containing high molecular weight proteins were pooled. 
Example 7 ; 

This Example illustrates the use of specified HMWl 
30 and HMW2 proteins in immunization studies. 

The copurified HMWl and HMW2 proteins prepared as 
described in Example 6 were tested to determine whether 
they would protect against experimental otitis media 
caused by the homologous strain. 
^5 Healthy adult chinchillas, l to 2 years of age with 

weights of 350 to 500g, received three monthly 
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subcutaneous injections with 40 ^g of an HMW1-HMW2 
protein mixture in Freund's adjuvant. Control animals 
received phosphate-buffered saline in Freunds' adjuvant. 
One month after the last injection, the animals were 
5 challenged by intrabullar inoculation with 3 00 cfu of 

NTHI strain 12. 

Middle ear infection developed in 5 of 5 control 
animals versus 5 of lO immvmized animals. Although only 
5 of 10 chinchillas were protected in this test, the test 

10 conditions are very stringent, requiring bacteria to be 
injected directly into the middle ear space and to 
proliferate in what is in essence a small abscess cavity. 
As seen from the additional data below, complete 
protection of chinchillas can be achieved. 

15 The five HMW1/HMW2- immunized animals that did not 

develop otitis media demonstrated no signs of middle ear 
inflammation when examined by otoscopy nor were middle 
ear effusions detectable. 

Among the five HMWl/HMW2-immunized animals that 

20 became infected, the total duration of middle ear 
infection as assessed by the persistence of culture- 
positive middle ear fluid was not different from 
controls. However, the degree of inflammation of the 
tympanic membranes was sxibjectively less them in the 

25 HMWl/HMW2"immunized animals. When quantitative bacterial 
counts were performed on the middle ear fluid specimens 
recovered from infected animals, notable differences were 
appaurent between the HMWl/HMW2-immunized and PBS- 
immunized animals (Figure 17) . Shown in Figure 17 are 

30 quantitative middle ear fluid bacterial counts from 
animals on day 7 post-challenge, a time point associated 
with the maximum colony counts in middle ear fluid. The 
data were log-transformed for purpose of statistical 
comparison. The data from the control animals are shown 

35 on the left and data from the high molecular weight 
protein immunized animals on the right. The two 



BNSOOCID: <WO 973691 4A1J_> 



wo 97/36914 



PCT/US97/04707 



49 



horizontal lines indicate the respective means and 
standard derivations of middle ear fluid colony counts 
for only the infected animals in each group. As can be 
seen from this Figure, the HMWl/HMW2-immuni2ed animals 
5 had significantly lower middle ear fluid bacterial counts 
than the PBS-immtinized controls, geometric means of 7.4 
X 10* and 1.3 X 10^, respectively {p=0.02. Students' t- 
test) 

Serum antibody titres following immunization were 
10 comparable in uninfected etnd infected animals. However, 
infection in immunized animals was uniformly associated 
with the appearance of bacteria down-regulated in 
expression of the HMW proteins, suggesting bacterial 
selection in response to immunologic pressxire. 
^5 Although this data shows that protection following 

immimization was not complete, this data suggests the HMW 
adhesin proteins are potentially important protective 
antigens which may comprise one component of a multi- 
component NTHXjvaccine. „ 
20 __Inrzaddition, complete- protection has been achieved 

in the chinchilla model at lower dosage challenge, as set 
forth in Table 3 below. 

Groups of five animals were immunized with 20 fMg of 
the HMW1-HMW2 mixture prepared as described in Example 6 
25 on days l, 28 and 42 in the presence of alum. Blood 

samples were collected on day 53 to monitor the antibody 
response. on day 56, the left ear of animals was 
challenged with about 10 cfu of H. influenzae strain 12. 
Ear infection was monitored on day 4. Four animals in 
30 Group 3 were infected previously by H. influenzae strain 
12 and were recovered completely for at least one month 
before the second challenge. 
Example 8 : 

This Example illustrates the provision of synthetic 
35 peptides corresponding to a portion only of the HMWl 
protein. 
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A number of synthetic peptides were derived from 
HMWl. Antisera then were raised to these peptides. The 
anti-peptide antisera to peptide HMW1-P5 was shown to 
recognize HMWl. Peptide HMW1-P5 covers amino acids 1453 
5 to 1481 of HMWl^ has the sequence 
VDEVIEAIO^ILEKVKDLiSDEEREALAKIiG (SEQ ID No: 11), and 
represents bases 14 98 to 1576 in Figure 10, 

This finding demonstrates that the DNA sequence and 
the derived protein is being interpreted in the correct 
10 reading frame and that peptides derived from the sequence 
can be produced which will be immunogenic. 
Example 9 : 

This Example describes the generation of monoclonal 
antibodies to the high moleculeir weight proteins of non- 
15 typeable H. influenzae . 

Monoclonal antibodies were generated using 
standard techniques. In brief, female BAL.B/C mice (4 to 
6 weeks old) were immunized by intraperitoneal injection 
with high molecular weight proteins purified from 
20 -nontvpable- Haemonhilus strain 5 or strain 12, as 
described in Example 6. The first injection of 40 to 50 

of protein was administered with Freund's complet 
adjuvant and the second dose, received four to five weeks 
after the first, was administered with phosphate-buffer d 
25 saline. Three days following the second injection, th 
mice were sacrificed and splenic lymphocytes were fused 
with SP2/0-Agl4 plasmacytoma cells. 

Two weeks following fusion, hybridoma supernatants 
were screened for the presence of high molecular weight 
30 protein specific antibodies by a dot-blot assay. 

Purified high molecular weight proteins at a 
concentration of lo per ml in TRIS-buf f ered saline 
(TBS) , were used to sensitize nitrocellulose sheets (Bio- 
Rad I>aborator ies , Richmond, CA) by soaking for 20 
35 minutes. Following a blocking step with TBS-3% gelatin, 
the nitrocellulose was incubated for 60 minutes at room 
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temperature with individual hybridoma supernatants , at a 
1:5 dilution in TBS-0. 1 % Tween, using a 96-well Bio-Dot 
micro-filtration apparatus (Bio-Rad) . After washing, the 
sheets were incubated for one hour with aUcaline- 
5 phosphatase-conjugated affinity isolated goat-ant i (mouse 
IgG + IgM) antibodies (Tago, Inc., Bxirlingame, CA) . 
Following additional washes, positive supernatan-ts were 
identified by incxibation of the nitrocellulose sheet in 
alkaline phosphatase buffer (0.10 M TRIS, 0.10 M NaCl, 
10 0.005 M MgCl,,) containing nitroblue tetrazolium (0.1 

mg/ml) and 5-bromo-4-chloro-3-indoyl phosphate (BCIP) 
(0. 05 mg/ml) . 

For the antibody isotyping and immunoelectron 
microscopy studies to be described below, the monoclonal 
15 antibodies were purified from hybridoma supernatants. 

The antibodies recovered in this work were all of the -IgG 
class. To pvirify the monoclonal antibodies, the 
hybridoma supernatants were first subjected to ammonium 
sulfate precipitation (50% final concentration at O^'C) . 
20 Following overnight incubation, the precipitate was 
recovered by centrifugation and resolxibilized in 
phosphate buffered saline. The solution was then 
dialyzed overnight against o.Ol M sodium phosphate 
buffer, pH 6.0. The following day the sample was applied 
25 to a DEAE-Sephacel column preequilibrated with the same 
phosphate buffer and the proteins were subsequently 
eluted with a KCl gradient. Coltamn fractions containing 
the monoclonal antibodies were identified by examination 
of samples on Coomassie gels for protein bands typical of 
30 light and heavy chains. 

The isotype of each monoclonal antibody was 
determined by immunodiffusion using the Ouchterlony 
method. Immunodiffusion plates were prepared on glass 
slides with 10 ml of i% DNA-grade agarose (FMC 
35 Bioproducts, Rockland, ME) in phospate-buf fered saline. 

After the agarose solidified, 5-mm wells were punched 
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into the agarose in a circular pattern. The center well 
contained a concentrated preparation of the monoclonal 
antibody being evaluated and the sxarrounding wells 
contained goat ant^i-mouse subclass-specif ic antibodies 
5 (Tago) • The plates were incubated for 48 hours in a 

humid chamber at 4«C and then examined for white lines of 
immunoprecipitation . 

Hybridoma supernatants which were reactive in the 
dot-blot assay described above were ex2unined by Western 
10 blot analysis, both to confirm the reactivity with the 

high molecular weight proteins of the homologous 
nontypable Haemophilus strain and to examine the cross- 
reactivity with similar proteins in heterologous strains. 
Nontypable Haemophilus influenzae cell sonicates 
15 containing 100 ^g of total protein were solubilized in 
electrophoresis seuaple buffer, sxibjected to SDS- 
polyacrylamide gel electrophoresis on 7.5% acrylamide 
gels, and transferred to nitrocellulose using a Genie 
- electrophoretic ±)dotter (Idea Scientific Company, 
20 " Corvallis— OR) for 45 min~at 24~ V. After transfer , the 
nitrocellulose sheet was blocked and then probed 
sequentially with the hybridoma supernatant, with 
alkaline phosphatase-conjugated goat-ant i (mouse IgG + 
IgM) second antibody, and finally bovind antibodies were 
25 detected by incubation with nitroblue tetrazolium/BCIP 
solution. This saone assay was employed to examine the 
reactivity of the monoclonals with recombinant fusion 
proteins expressed in E. coll (see below) • 

In preparation for immunoelectronmicroscopy, 
30 bacteria were grown overnight on supplemented chocolate 

agar and several colonies were suspended in phosphate- 
buff ered-saline containing 1 % albumin. A 20-^x1 drop of 
this bacterial suspension was then applied to a caxbon- 
coated grid and incubated for 2 min. Excess fluid was 
35 removed fimd the specimen was then incubated for 5 min 
with the purified high molecular weight protein-specific 
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monoclonal antibody being analyzed. Following removal of 
excess liquid and a wash with phosphatebuf fared saline, 
the specimen was incubated with anti-mouse IgG conjugated 
to lo-nm colloidal gold particles. I^ollowing final 
washes with phosphate-buffered saline, the sample was 
rinsed with distilled water. Staining of the bacterial 
cells was performed with 0.5% uranyl acetate for 1 nin. 
Samples were then examined in a Phillips 201c electron 
microscope. 

Fourteen different hybridomas were recovered which 
produced monoclonal antibodies reactive with the purified 
HMWl and HMW2 proteins of nontypable Haemophilt.« strain 
12 in the immunoblot screening assay, of the monoclonals 
screened by immunoelectron microscopy to date, as 
15 described below, two were demonstrated to bind surface 
epitopes on prototype strain 12. These two monoclonal 

antibodies, designated AD6 (ATCC ) and 10C5 (ATCC 

) , were both of the IgGl subclass. 

Example lO : 

20 This __Example- describes the identification of 

surface-exposed B-cell epitopes of high molecular weight 
proteins of non-typeable H. inf luer>r*.«> , 

To map epitopes recognized by the monoclonal 
antibodies, their reactivity with a panel of recombinant 
fusion proteins expressed by pGEMEX* recombinant plasmids 
was examined. These plasmids were constructed by cloning 
various segments of the hmwla or hmw2A structural genes 
into T7 expression vectors pGEMEX* -i and GEMEX«-2 
(Promega Corporation, Madison, Wl) . shown in Figures 18 
and 19 are the schematic diagrams depicting the segments 
derived from the fenisa and hmw2 gene clusters cloned into 
the pGEMEX® expression plasmids. These segments were 
inserted such that in-frame fusions were created at each 
junction site. Thus, these plasmids encode recombinant 
fusion proteins containing pGEMEX -encoded T7 gene lo 
amino acids in the regions indicated by the hatched bars 
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and hmwla or hinw2A encoded amino acids in the regions 
indicated by the black bars in these Figures. A stop 
codon is present at the jxinction of the black and white 
segments of each b2ur. 
5 Four discrete sites within the hmwlA structural gene 

were selected as the 5' ends of the hmwl inserts. For 
each 5' end, a series of progressively smaller inserts 
was created by taking advantage of convenient downstream 
restriction sites. The first recombinant plasmid 

10 depicted in Figure 18 was constructed by isolating a 4.9 
kbp fiaBHI-Hinfllll fragment from pHMWl-l4 (Example 1, 
Figure 5A) , which contains the entire hmwl gene cluster 
and inserting it into fiafflHI-flindlll digested pGEMEX«-l. 
The second recombinant plasmid in this set was 

15 constructed by digesting the "parent" plasmid with 
BstEII-Hindlll , recovering the 6.8 kbp larger fragment, 
blunt-ending with Klenow DNA polymerase, emd religating. 
The third recombinant plasmid in this set was constructed 
by digesting the "parent" plasmid with £iaI-HindIII , 

20 recovering the 6.0 kbp larger fragment, blunt-ending, and 
religating. The next set of four hmwl recombinant 
plasmids was derived from a "parent" plasmid constructed 
by ligating a 2.2 kbp £coRI fragment from the hmwl gene 
cluster into fic^RI -digested pGEMEX«-2 . The other three 

25 recombinant plasmids in this second set were constructed 
by digesting at downstream fifitEII, JEsoRV, and Cla l sites, 
respectively, using techniques similar to those just 
described. The third set of three recombinant plasmids 
depicted was derived from a "parent" plasmid constructed 

30 by double-digesting the first recombinant plasmid 
described above (i.e. the one containing the 4.9 kbp 
fiaffiHI-Hijidlll fragment) with fiamHI and Cla l, blunt- 
ending, and religating. This resulted in a construct 
encoding a recombinant protein with an in-f raune fusion at 

35 the Cla l site of the hmwlA gene. The remaining two 
plasmids in this third set were constructed by digesting 
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at dovmstream gstEII and Eco RV sites, respectively. 
Finally, the fouirth set of two recombinant plasmids was 
derived from a "parent" plasmid constructed by double- 
digesting the original BamHI- Hind lll construct with 
Hillfill and £coRV, then religating. This resulted in a 
construct encoding a recombinant protein with an in-frame 
fusion at the £csRV site of the hmwlA gene. The 
remaining plasmid in this fourth set was constructed by 
digesting at the dovmstream Bst EIl site. 

Three discrete sites with the hmw2A structural gene 
were selected as the 5' ends of the hmw2 inserts. The 
first recombinant plasmid depicted in Figure 19 was 
constructed by isolating a 6.0 kbp £coRI-3ajoI fragment 
from PHMW2-21, which contains the entire hmw2 gene 
15 cluster, and inserting it into ^coRI-Sall digested 
pGEMEX*-!. The second recombinant plasmid in this set 
was constructed by digesting at an Mlu l site near the 3 ' 
end of the hsmZh gene. The second set of two hmw2 
recombinant plasmids was derived from .a "parent" plasmid 
20 constructed by isolating a 2.3 kbp iiin^m fragment from 
PHMW2-21 and inserting it into HiDdl II -digested pGEMEX - 
2. The remaining plasmid in this second set was 
constructed by digesting at the downstream Mlai site. 
Finally, the last plasmid depicted was constructed by 
25 isolating a 1.2 kbp fiincll-iiindlll . fragment from the 
indicated location in the tmsa g&n& cluster and inserting 
it into Hinsll-aindlll digested pGEMEX»-l. 

Each of the recombinant plasmids was used to 
transform fi. coJ,j.. strain JMlOl. The resulting 

30 transformants were used to generate the recombinant 
fusion proteins employed in the mapping studies. To 
prepare recombinant proteins, the transformed e. coli 
strains were grown to an A«o of 0. 5 in L broth containing 
so Mg of ampicillin per ml. IPTG was then added to imM 
35 and mGPl-2, the M13 phage containing the. T7 RNA 
polymerase gene, was added at multiplicity of infection 
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Of 10. One hour later, cells were harvested, and a 
sonicate of the cells was prepared. The protein 
concentrations of the samples were determined and cell 
sonicates containing 100 /ig of total protein were 
5 solubilized in electrophoresis sample buffer, subjected 
to SDS-polyacrylamide gel electrophoresis, and examined 
on Coomassie gels to assess the expression level of 
recombinant fusion proteins. Once high levels of 
expression of the recombinant fusion proteins were 
10 confirmed, the cell sonicates were used in the Western 
blot analyses described above. 

Shown in Figvure 20 is an electron micrograph 
demonstrating surface binding of Mab AD6 to 
representative nontypable Haemophilus inf luen^a^ strains. 
In the upper left panel of the Figure is nontypable 
HagffQPhiXus strain 12 and in the upper right panel is a 
strain 12 derivative which no longer expressed the high 
molecular weight proteins. As can be seen, colloidal 
gold -particles decorate the surface" of strain 12, 
indicating bound AD6 antibody on the surface . In 
contrast, no gold particles are evident on the siaxface of 
the strain 12 mutant which no longer expresses the high 
molecular weight proteins. These results indicate that 
monoclonal antibody AD6 is recognizing a surface-exposed 
25 epitope on the high molecular weight proteins of strain 
12. Analogous studies were performed with monoclonal 
antibody IOCS demonstrating it too bound to surface^ 
accessible epitopes on the high molecular weight HMWl and 
HMW2 proteins of strain 12. 
30 Having identified two surface-binding monoclonals, 

the epitope which each monoclonal recognized was mapped. 
To accomplish this task, the two sets of recombinant 
plasmids containing various portions of either the hmwla 
or hxnw^A structural genes (Figures 18 emd 19) were 
35 employed. With these complementary sets of recombinant 

plasmids, the epitopes recognized by the monoclonal 
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antibodies were mapped to relatively small regions of the 
very large HMWl and HMW2 proteins- 

To localize epitopes recognized by Mab AD6, the 
pattern of reactivity of this monoclonal antibody with a 
5 large set of recombinant fusion protein was exeuained. 

Figure 21 is a Western blot which demonstrates the 
pattern of reactivity of Mab AD6 with five recombinant 
fusion proteins, a relevant subset of the larger nximber 
originally exeunined. From analysis of the pattern of 
10 reactivity of Mab AD6 wit:h this set of proteins, one is 
able to map the epitope it recognizes to a very short 
segment of the HMWl and HMW2 proteins. A brief svunmary 
of this analysis follows. For reference, the relevamt 
portions of the hmwiA or hmw2A structural genes which 
15 were expressed in the recombinant proteins being examined 
are indicated in the diagram at the top of the figure. 
As shown in lane 1, Mab ADS recognizes an epitope encod.ed 
by fragment l, a fragment which encompasses the distal 
one-foxirth of the hmwlA gene. Reactivity is lost, when 
20 only the portion of— the gene comprising fragment 2 is 
expressed. This observation localizes the AD6 epitope 
somewhere within the last 180 eunino acids at the carboxy- 
terminal end of the HMWl protein. Mab ADS also 
recognizes an epitope encoded by fragment 3, derived from 
25 the hmw2A structural gene. This is a rather large 
fragment which encompasses nearly one-third of the gene. 
Reactivity is lost when fragment 4 is expressed. The 
only difference between fragments 3 and 4 is that the 
last 225 base pairs at the 3' end of the hmw2A structural 
30 gene were deleted in the latter construct. This 
observation indicates that the ADS epitope is encoded by 
this short terminal segment of the hmw2A gene. Strong 
support for this idea is provided by the demonstrated 
binding of Mab ADS to the recombinant protein encoded by 
35 fragment 5, a fragment encompassing the distal one-tenth 
of the hmv2A structxiral gene. Taken together, these data 
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identify the AD6 epitope as common to both the HMWi and 
HMW2 proteins and place its location with 7 5 amino acids 
of the carboxy termini of the two proteins. 

Figure 22 is a Western blot demonstrating the 
5 pattern of reactivity of Mab IOCS with the same five 
recombinant fusion proteins examined in Figure 21. As 
shown in lane l, Mab IOCS recognizes an epitope encoded 
by fragment 1. in contrast to Mab AD6, Mab IOCS also 
recognizes an epitope encoded by fragment 2. Also in 
10 contrast to Mab AD6, Mab IOCS does not recognize any of 
tbe hjttwg^-derived recombinant fusion proteins. Thus, 
these data identify the IOCS epitope as being tinique to 
the HMWI protein and as being encoded by the fragment 
designated as fragment 2 in this figure. This fragment 
15 corresponds to a 155-amino acid segment encoded by the 

EcoRV-astEII segment of the hmwlA structural gene. 

Having identified the approximate locations of the 
epitopes on HMWI and HMW2 recognized by the two 
monoclonals, the extent to which these epitopes were 
20 —shared by the — high molecular — weight proteins" of 
heterologous nontypable Haemophilus strains was next 
determined. When examined in Western blot assays with 
bacterial cell sonicates, Mab AD6 was reactive with 
epitopes expressed on the high molecular weight proteins 
25 of 75% of the inventor's collection of more tban 125 

nontypable Haemoph ilus influenzae strains. In fact, this 
monoclonal appeared to recognize epitopes expressed on 
high moleculeo: weight proteins in virtually all 
nontypable Haemophilus strains which we previously 
30 identified as expressing HMWl/HMW2-like proteins. Figure 
23 is an example of a Western blot demonstrating the 
reactivity of Mab ADS with a representative panel of such 
heterologous strains. As can be seen, the monoclonal 
antibody recognizes one or two bands in the 100 to 150 
35 kDa range in each of these strains. For reference, the 
strain shown in lane 1 is prototype strain 12 and the two 
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bands visualized represent HMWl and HMW2 as the upper and 
lower inununoreactive bands, respectively. 

In contrast to the broad cross-reactivity observed 
with Mab AD6, Mab IOCS was much more limited in its 
5 ability to recognize high molecular weight proteins in 

heterologous strains. Mab IOCS recognized high molecular 
weight proteins in approximately 40% of the strains which 
expressed HMWl /HMW2- like proteins. As was the case with 
Mab AD6, Mab IOCS did not recognize proteins in emy the 
10 nontypable Haemophilus strains which did not express 
HMWl /HMW2- like proteins. 

In a limited fashion, the reactivity of Mab AO 6 with 
surface-exposed epitopes on the heterologous strains has 
been examined . In the bottom two panels of Figure 2 0 aire 
15 electron micrographs demonstrating the reactivity of Mab 
AD6 with surface-accessible epitopes on nontypable 
P^^"°P^^^*^s strains 5 and 15. As can be seen, abundant 
colloidal-gold particles are evident on the surfaces of 
°^ these strains, confirming their sxxrface 

20 repression, of the ADS epitope. Although limited in 

scope, these data suggest that the AD6 epitope may be a 
common surface-accessible epitope on the high molecular 
weight adhesion proteins of most nontypable Haemophilus 
j-nfluenz^e which express HMWl /HMW2- like proteins. 

25 

SUMMARY OF DISCLQSTIR F 

In summary of this disclosure, the present invention 
provides high molecular weight proteins of non-typeable 
pae^aophAJ-us , genes coding for the same and vaccines 
30 incorporating such proteins. Modifications are possible 
within the scope of this invention. 



wo 97/36914 



PCT/US97/04707 



60 



TABLE 1 : Effect of mutation of high molecular weight 
proteins on adherence to Chang epithelial cells by 
nontypable H, influenzae. 





ADHERENCE % ^ 


ptr^in 


% Inoculation 


Relative to 
wild Tvoet 


Strain 12 derivatives 
wild type 


Bl.ie ± 5.9 


100.0 ± 6.7 


HMWl- mutant 


6.0 ± 0.9 


6.8 ± 1.0 


HMW2- mutant 


89.9 ± 10.8 


102.5 ± 12.3 


HMW17HMW2- mutant 


2.0 ± 0.3 


2.3 ± 0.3 


Strain 5 derivatives 
wild type 


78.7 ± 3.2 


100.0 ± 4.1 


HMWl-like mutant 


15.7 ± 2.6 


19.9 ± 3.3 


HMW2-like mutant 


103.7 ± 14.0 


131.7 ± 17.8 


double mutant 


3.5 ± 0.6 


4.4 ± 0.8 



* Numbers represent meaui (± standard error of the mean) 
of measvurements in triplicate or quadruplicate from 
representative experiments. 



t Adherence values for strain 12 derivatives are 
relative to strain 12 wild type; values for strain 5 
derivatives are relative to strain 5 wild type. 
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TABLE 2; Adherence by E. coll DH5a and HBlOl harboring 
hmtfl or hsnhir2 gene clusters. 



strain* 


Adherence relative to H. 
influenzae strain 12 f 


DH5a (PT7-7) 


0.7 ± 0.02 


DH5a (pHMWl-14) 


114.2 ± 15.9 


I DHSa (PHMW2-21) 


14.0 ± 3.7 


HBlOl (pT7-7) 


1.2 ± 0.5 


HBlOl (pHMWl-14) 


93.6 ± 15.8 


HBlOl (PHMW2-21) 


3.6 ± 0-9 



* The plasmid pHMWl-l4 contains the hnmri gene cluster, 
while PHMW2-21 contains the hmhr2 gene cluster; pT7-7 is 
the cloning vector used in these constructs. 



t Numbers represent the mean (± standard error of the 
mean) of measurements made in triplicate from 
representative experiments. 
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TABLE 3 : Protective ability of HMW protein against non- 
typeable E. InflUBnzetG challenge in chinchilla model 



Group 


Antigens 


Total 
Animals 


Number of Animals Showed 
Positive Ear Infection 


(#) 






Tympano- 
gram 


Otosco- 

Examin- 
ation 


cfu of 
Bacteria 
no fiL 


1 


HMW 


5 


0 


0 


0 


2 


None 


5 


5 


5 


850- 
3200 
(4/5) 


3 


Convalescent 


4 


0 


0 


0 
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(C) OPERATING SYSTEM: PC-IXDS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.3o 

(vi) CXmRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/617,697 

(B) FILING DATE: Ol-APR-1996 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/302,832 

(B) FILING DATE: 05-OCT-1994 

(vii) PRIOR APPLICATION DATA: " 

(A) APPLICATION NUMBER: US PCT/US93 /02166 

(B) FILING DATE: 16-MAR-1993 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Berks tresser, Jerry W 

(B) REGISTRATION NUMBER: 22,651 

(C) REFERENCE/DOCKET NUMBER: 1038-557 

<ix) TELECOMMUNICATION INFORMATION: 

(A) TEIiEPHONE: (703) 4X5-0610 

(B) TELEFAX: (703) 415-0813 
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(A) LENGTH: 5116 base pairs 
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(C) STRANDEDNESS : single 
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ACAGCGTTCT CTTAATACTA GTACAAACCC ACAATAAAAT ATX3ACAAACA ACAAtTACAA 
CACCTTTTTT GCAGTCTATA TGCAAATATT TTAAAAAATA GTATAAATCC GCCATATAAA 
ATGGTATAAT CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC ATCTTTCATC 



60 
120 
180 
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TTTCATCTTT 


CATCTTTCAT 


^ X X X V.ifX X \v X X 


1 L\. ill \JJ\ 


1 1 i X vJAx 


TTCATCTTTC 


240 


ACATGCCCTG 


ATGAACCGAG 






AATGAAGAGG 


GAGCTGAACG 


300 


AACGCAAATG 


ATAAAGTAAT 


X X AAX X Ktf X X ^ 




AG V9 AvjAAAAX 


ATGAACAAGC 


360 


TATATCGTCT 


CAAATTCAGC 




ATGCTTTGGT 


TGCTGTGTCT 


GAATTGGCAC 


420 


GGGGTTGTGA 


CCATTCCACA 




& & iL & a/'v^ 


xT^tTTCUCATG 


AAAGTGCGTC 


480 


ACTTAGCGTT 


AAAGCCACTT 


X VvWUav X>1XU X 


A ^^^^ & ^^^^^p^^^^ 

1 Xi^XL. XXI 


K^/^nv <*i*ft K 
AiKs xvjTAACA 


TCTATTCCAC 


540 




AGCAAGCX3GC 


X X X%V,.^WW9\J#W% 


X\»v» A 1 vjXAGT 


AwACvKiCACA 


GCCACTATGC 


600 


AAGTAGATGG 


TAATAAAACC 


'^X XaXwWUWA 


A%-A(jToXTGA 


CGATATCATT 


AATTGGAAAC 


660 


AATTTAACAT 


CGACCAAAAT 


GAAATGGTGr* 


AVa X X X X X A^.A 


AuAAAACAAVr 


AACTCCGCCG 


720 


TATTCAACCG 


TGTTACATCT 


AACCAAATCT* 


^Wt..AAX XAAA 


A WJQA X X X X A . 


GATTCTAACG 


7iB0 


GACAAGTCTT 


TTTAATCAAC 


CCAAATGOTA 


Tf^ A A A T A 
X k*A\»AA X AV7V9 


X AAAvaACJviCA 


ATTATTAACA 


840 


CTAATGGCTT 


TACGGCPTYTP 




TTTCXAACQA 


AAhCATCAAG 


GCQCGTAATT 


900 


TCACCTTCGA 


GCAAACCAAA 


van X #wvt.w^\v^ 


XwvaL. l\iAAAT 


TGTVaAATCAC 


GQTTTAATTA 


960 






^ 1 i\MA.XwXT A 


TTGGTGQCAA 


AGTGAAAAAC 


GAGGGTGTGA 


1020 






ATTTCTTTAC 


TCG CAGGGCA. 


AAAAATCACC 


ATCAGCGATA 


1080 




X X ^'^^ i ^^^^^ ^^^^ 


TACAuCATTXS 


CCGCGCCTGA 


AAATGAAGCG 


GTCAATCTGG 


1140 


^WwrVXAA X A X - 






ATGTCCGTGC 


TGCCACTATT 


CGAAACCAAG 


1200 


\» X X X Sv 


XAJS. X\7>%X X V. 1 


V* X, AM\j^,ZAAA)9 


ATAAAAGCGG 


CAATATTGTr 


CTTTCCGCCA 


1260 


AAGAGGGTGA 


^wV»%»w*wW%X X 




TxTv- wistrrcA 


m ft ft Mv Vft ^^%ft ft 

AAATCAGCAA 


GCTAAAGGCG 


1320 


GCAAGCTGAT 


GATTAC1/^GG(? 


X £\Jr%n\s X 


VJAX XAAAAAW 


AvtU 1\3CIAG X 1 


ATCGACCTTT 


1380 


CAGGTAAAGA 


AGGGGGAGAA 


•^W ^ A w X X v> 


GCW?Tf3AfY3A 

VVVvOXS XV>A\»\3A 




u^^TAAAAAGG 


1440 


GCATTCAATT 


AGCAAAGAAA 


'VwW X W X X x#v^ 


A A A A Art^2r*Trv^ 

AAAAA\jrV>^ X 


A AOOA*f^* 
AACvJAU LAAx 


GTATCAGGCA 


1500 


AAGAAAAAGG 


CGGACGCGCT 


ATTGTGTGGG 


GCGATATTGC! 


\3 X X An X X VJn^ 


WJ^-.AA^ AX X A 


^ C f A 


ACGCTCAAGG 


TA6TGGTQAT 


ATCGCTAAAA 


^^Ul\4XU\4X X X 


X \ J X \ J%9#UJv«,^X9 




lo2U 


ATTTATTCAT 


CAAAGACAAT 


GCAATTGTTG 


ACGCCAAAGA 


GTGGTTGTTA 


nA^VW2rtA*rA 
uA\..A.A» VSVaA X A 




ATGTATCTAT 


TAATGCAGAA 


ACAGCAGOAC 


GCAGCAATAC 


TTCAGAAGAC 


UA X \3 AA\L AvJA 


17*0 


CGGGATCCGG 


GAATAGTGCC 


AGCACCCCAA 


AACGAAACAA 


AGAAAAGACA 


Af^AHTTAAOAA 




ACACAACTCT 


TGAGAGTATA 


CTAAAAAAAG 


GTACCri"l\3T 


TAACATCACT 


ViwXAAXCAAw 




GCATCTATGT 


CAATAGCTCC 


ATTAATTTAT 


CCAATGGCAG 


CTTAACTCTT 


TGGAGTGAGG 


1920 


GTCGQAGCGG TGGCXKXXJIT GAGATTAACA ACGATATTAC CACCGOTOAT 


GATACCAOAG 


1980 


GTGCAAACTT 


AACAATTTAC 


TCAGGCGGCT 


GGGTTGATGT 


TCATAAAAAT 


ATCTCACXXX3 


2040 


GGGCGCAAGG 


TAACATAAAC 


ATTACAGCTA 


AACAAGATAT 


CGCCTTTGAG 


AAAGGAAGCA 


2100 


ACCAAGTCAT 


TACAGGTCAA 


GGGACTATTA 


CCTCAGGCAA 


TCAAAAAGGT 


TTTAGATTTA 


2160 


ATAATGTCTC 


TCTAAACGGC 


ACTGGCAGCG 


GACTGCAATT 


CACCACTAAA 


AGAACCAATA 


2220 
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AATACGCTAT 
CAATGGTTTT 
ATTTAACCTC 
GAAGCGATAG 
AAGACACTAC 
TAGGGATAAA 
CGGGAGGGGG 
GTGTAGTTAT 
CTTCAGGCTC 
GAGGCAACAT 
TAGCCAAAAA 
TAACAGAAAT 
CGGATTTTGA 
GCAACCTTAC 
ACGCTAATTT 
AAGGCAATTC 
CCAAGAATTT 
ATATAACCAA 
AAATTGGCGG 
ATATTACCAA 
CGACAAACAA 
ATATTTCAGG 
GTAACACGAA 
ATTCAAAAAT 
GTAGTAATAA 
AAAATGTAAC 
GTGGAGAAAT 
TAACCGCTCA 
TTACTGCAAC 
CTGCAAATAG 
TAACCACTTC 
TTAAAGCAAC 
AGGCTAACGT 
ATGTTACGGC 



CACAAATAAA 
ACCTAAAAAT 
CTTAAATGTT 
TGCAGGCACA 
CTTTAATGTT 
TAAGTATTCT 
GAGTGTTGAT 
AAATTCTAAA 
AACAAAAACT 
AACACTTTTG 
AAACATAACC 
CGAAGGCAAT 
CAACCATCAA 
CGCTGGAGGC 
CAAAGCTATC 
AAATATTTCC 
AAGCATCACC 
TAAAAACGGT 
CGATGTCTCG 
ACAGATAACA 
TGCCAATCTA 
TTTCAATAAA 
TAGTGCTGAT 
CTCTGCTGAC 
CAACACTGAA 
AGTAAACAAC 
TACCACTAAA 
AACAGGTAGT 
COAGGGCOCT 
CGGTGCATTA 
AAGTCAATCA 
CGAAAGTTTA 
AACAAGTGCA 
AAACGCTGGC 



TTTGAAGGGA 
GAAAGTGGAT 
TCCGAGAGTG 
CTTACCCAGC 
GAACGAAATG 
AGTTTGAATT 
TTCACACTTC 
TACTTTAATG 
GGCTTCTCAA 
CAAGTT G AAG 
TTTGAAGGAG 
GTTACTATCA 
AAACCTTTAA 
AATATTGTCA 
ACAAATTTCA 
ATTGCCAAAG 
ACCAACTCCA 
GATTTAAATA 
CAAAAAGAAG 
ATCAAGGCAG 
ACCATTAAAA 
GCAGAGATTA 
GGTACTAATG 
OGTCACAAjGG 
GATAGCAGTG 
AATATTACTT 
ACAGGTACAA 
ATCCTAGGTG 
CTTGCTQTAA 
ACCACTTTGG 
GGCGATATCG 
ACCACTCAAT 
ACAGGTACAA 
GATTTAACAG 



65 

CTTTAAATAT 
ATGATAAATT 
GCGAGTTTAA 
CTTATAATTT 
CAAGAGTCAA 
ACGCATCATT 
TCGCCTCATC 
TTTCAACAGG 
TAGAGAAAGA 
GCACCGATGG 
GTAACATCAC 
ATAACAACGC 
CTATTAAAAA 
ATATAGCCGG 
CTTTTAATGT 
GAGGGGCTCG 
GCTCCACTTA 
TTACGAACX3A 
GTAATCTCAC 
GTGTTGATGG 
CCAAAGAATT 
CAGCTAAAGA 
CCAAAAAAGT 
TGACACTACA 
ACAATAATGC 
CTCACAAAGC 
CCATTAACGC 
GAATTGAGTC 
GCAATATTTC 
CAGGCTCTAC 
GCGGTACGAT 
CCAATTCAAA 
TTGGTGGTAC 
TTGGGAATGG 



TTCAGGGAAA 
CAAAGGACGC 
CCTCACTATT 
AAACGGTATA 
CTTTGACATC 
TAATGGAAAC 
CTCTAACGTC 
GTCAAGTTTA 
TTTAACTTTA 
AATGATTGGT 
CTTTGGCTCC 
TAACGTCACT 
AGATGTCATC 
AAATCTTACC 
AGGCGGCTTG 
CTTTAAAGAC 
CCGCACTATT 
AGGTAGTGAT 
GATTTCTTCT 
GGAGAATTCC 
GAAATTAACG 
TGGTAGTGAT 
AACCTTTAAC 
CAGCAAAGTG 
CGGCTTAACT 
AGTGAGCATC 
AACXACTGGT 
CAGCTCTGGC 
GGGCAACACC 
AATTAAAGGA 
TTCTGGTGGC 
AATTAAAGCA 
GATTTCCGGT 
CGCAGAAATT 



GTGAACATCT 
ACTTACTGGA 
GACTCCAGAG 
TCATTCAACA 
AAGGCACCAA 
ATTTCAGTTT 
CAAACCCCCG 
AGATTTAAAA 
AATOCCACCG 
AAAGGCATTG 
AGGAAAGCCG 
CTTATCGGTT 
ATTAATAGCG 
GTTGAAAGTA 
TTTGACAACA 
ATTGATAATT 
ATAAGCGGCA 
ACTGAAATGC 
GACAAAATCA 
GATTCAGACG 
CAAGACCTAA 
TTAACTATTG 
CAGGTTAAAG 
GAAACATCCG 
ATCGATGCAA 
TCTGCGACAA 
AACGTQGAGA 
TCTGTAACAC 
GTTACTGTTA 
ACCGAGAGTG 
ACAGTAGAGG 
ACAACAGGCG 
AATAOGGTAA 
AATGCGACAG 



2280 

2340 

2400 

2460 

2S20 

25B0 

2640 

2700 

2760 

2820 

2d80 

2940 

3000 

3060 

3X20 

3180 

3240 

3300 

3360 

3420 

34 80 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 
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AAGGAGCTGC AACCTTAACT ACATCATCGG GCAAATTAAC TACC3AAGCT AGTTCACACA 4320 

TTACTTCAGC CAAGGGTCAG GTAAATCTTT CAGCTCAGGA TGGTAGCGTT GCAGGAAGTA 4 380 

TTAATGCCGC CAATGTOACA CTAAATACTA CAGGCACTTT AACTACCGTG AAGGGTTCAA 444 0 

ACATTAATGC AACCAGCGGT ACCTTGGTTA TTAACGCAAA AGACX3CTGAG CTAAATGGCG 4 500 

CAGCATTGGG TAACCACACA GTGGTAAATG CAACCAACGC AAATGGCTCC GGCAGCGTAA 4 560 

TCGCGACAAC CTCAAGCAGA GTGAACATCA CTGGGGATTT AATCACAATA AATGGATTAA 462 0 

ATATCATTTC AAAAAACGGT ATAAACACCG TACTGTTAAA AGGCGTTAAA ATTGATGTGA 468 0 

AATACATTCA ACCX;GGTATA GCAAGCGTAG ATGAAGTAAT TGAAGCGAAA CGCATCCTTG 4 74 0 

AGAAGGTAAA AGATTTATCT GATGAAGAAA GAGAAGCGTT AGCTAAACTT GGAGTAAGTG 4800 

CTGTACGTTT TATTGAGCCA AATAATACAA TTACAGTCGA TACACAAAAT GAATTTGCAA 486 0 

CCAGACCATT AAGTCGAATA GTGATTTCTG AAGGCAGGGC GTGTTTCTCA AACAGTGATG 4 920 

GCGCGACGGT GTGCGTTAAT ATCGCTGATA ACGGGCGGTA GCGGTCAGTA ATTGACAAGG 4 98 0 

TAGATTTCAT CCTGCAATGA AGTCATTTTA TTTTCGTATT ATTTACTGTG TGGGTTAAAG 5040 

TTCAGTACGG GCTTTACCCA TCTTGTAAAA AATTACGGAG AATACAATAA AGTATTTTTA 5100 

ACAGGTTATT ATTATG 5116 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: ^ 

<A> "LENGTH: lS3iS araiino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asn Lys lie Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
15 10 15 

Val Ala Val Ser Glu Leu Ala Arg Gly Cys Asp His Ser Thr Glu Lys 
20 25 30 

Gly Ser Glu Lys Pro Ala Arg Met Lys Val Arg His Leu Ala Leu Lys 
35 40 45 

Pro Leu Ser Ala Met Leu Leu Ser Leu Gly Val Thr Ser He Pro Gin 

SO ^ 55 60 

Ser Val Leu Ala Ser Gly Leu Gin Gly Met Asp Val Val His Oly Thr 
€5 70 75 60 

Ala Thr Met Gin Val Asp Gly Asn Lys Thr He He Arg Asn Ser Val 
85 90 95 

Asp Ala He He Asn Trp Lys Gin Phe Asn He Asp Gin Asn Glu Met 
100 105 110 

Val Gin Phe Leu Gin Glu Asn Asn Asn Ser Ala Val Phe Asn Arg Val 
115 120 125 
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130 



140 



Gin Val Phe Leu lie Asn Pro Asn Gly He Thr He Gly Lys Asp Ala 

150 iss ^ / f 

He He Asn Thr Asn Gly Phe Thr Ala Ser Thr Leu Asp He Ser Asn 
165 170 

Glu Asn He Lys Ala Arg Asn Phe Thr Phe Glu Gin Thr Lys Asp Lys 
"0 les 190 

Ala Leu Ala Glu He Val Ash His Gly Leu He Thr Val Gly Lvs Asn 
195 ^ 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys He Thr 

230 235 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala Val Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr He Arg Asn Gin Gly Lys Leu Ser Ala 
275 280 285 

Asp Ser val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lvs 
290 29S 



300 



Glu Gly Glu Ala Glu lie Gly Gly val He Ser Ala Gln^Asn Gin Gin 
305 1^ 310 315 320 

f- 

Ala Lys Gly Gl^ Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lvs 
325 330 335 

Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tvr 
340 345 350 ^ 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 3SS 

Lys Lys Thr Ser Leu Glu Lya Gly Ser Thr lie Asn Val Ser Gly Lys 



380 



Glu Lys Gly Gly Arg Ala He Val Trp Gly Asp He Ala Leu He Asp 

390 395 

Gly Asn He Asn Ala Gin Gly Ser Gly Asp He Ala Lys Thr Gly Qlv 
405 410 415 ' 

Phe Val Glu Thr Ser Gly His Asp Leu Phe He Lys Asp Asn Ala He 
*20^ 425 430 

Val Asp Ala Lys' Glu Trp Leu Z,eu Asp Phe Asp Asn Val Ser He Asn 
435 440 445 

450 ^^'^ '^P '^P Thr 

Gly Ser Gly Asn Ser Ala Ser Thr Pro Lys Arg Asn Lys Glu Lys Thr 
465 470 475 480 
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Thr Leu Thr Asn Thr Thr Leu Glu Ser lie Leu Lys Lys Gly Thr Phe 
4B5 490 495 

Val Asn lie Thr Ala Asn Gin Arg He Tyr Val Asn Ser Ser lie Asn 
500 505 510 

Leu Ser Asn Gly Ser Leu Thr Leu Trp Ser Glu Gly Arg Ser Gly Gly 
515 520 525 

Gly Val Glu He Asn Asn Asp lie Thr Thr Gly Asp Asp Thr Arg Glv 
530 535 540 

Ala Asn Leu Thr He Tyr Ser Gly Gly Trp Val Asp Val His Lys Asn 
545 550 555 560 

He Ser Leu Gly Ala Gin Gly Asn He Asn He Thr Ala Lys Gin Asp 
565 570 575 

He Ala Phe Glu Lys Gly Ser Asn Gin Val He Thr Gly Gin Gly Thr 
580 585 590 

He Thr Ser Gly Asn Gin Lys Gly Phe Arg Phe Asn Asn Val Ser Leu 
595 600 605 

Asn Gly Thr Gly Ser Gly Leu Gin Phe Thr Thr Lys Arg Thr Asn Lys 
610 615 620 

Tyr Ala He Thr Asn Lys Phe Glu Gly Thr Leu Asn He Ser Gly Lys 
625 630 635 640 

Val Asn He Ser Met Val Leu Pro Lys Asn Glu Ser Gly Tyr Asp Lys 
645 " 650 655 

Phe Lys Gly Arg Thr Tyr Trp Asn Leu Thr Ser Leu Asn Val Ser Glu 
660 665 670 

Ser Gly Glu Phe Asn Leu Thr He Asp Ser Arg Gly Ser Asp Ser Ala 
675 680 685 

Gly Thr Leu Thr Gin Pro Tyr Asn Leu Asn Gly He Ser Phe Asn Lys 
690 695 700 

Asp Thr Thr Phe Asn Val Glu Arg Asn Ala Arg Val Asn Phe Asp He 
705 710 715 720 

Lys Ala Pro He Gly He Asn Lys Tyr Ser Ser Leu Asn Tyr Ala Ser 
725 730 735 

Phe Asn Gly Asn He Ser Val Ser Gly Gly Gly Ser Val Asp Phe Thr 
740 745 750 

Leu Leu Ala Ser Ser Ser Asn Val Gin Thr Pro Gly Val Val He Asn 
755 760 765 

Ser Lys Tyr Phe Asn Val Ser Thr Gly Ser Ser Leu Arg Phe Lys Thr 
770 775 780 

Ser Gly Ser Thr Lys Thr Gly Phe Ser He Glu Lys Asp Leu Thr Leu 
785 790 795 800 

Asn Ala Thr Gly Gly Asn He Thr Leu Leu Gin Val Glu Gly Thr Asp 
605 610 815 

Gly Met He Gly Lys Gly He Val Ala Lys Lys Asn He Thr Phe Glu 
820 825 ' 830 
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Gly Gly Asn lie Thr Phe Gly Ser Arg Lys Ala Val Thr Glu lie Glu 
835 640 845 

Gly Asn Val Thr lie Asn Asn Asn Ala Asn Val Thr Leu lie Glv Ser 
850 855 860 

Asp Phe Asp Asn His Gin Lys Pro Leu Thr lie Lys Lys Asp Val lie 
865 870 875 eeO 

lie Asn Ser Gly Asn Leu Thr Ala Gly Gly Asn He Val Asn He Ala 
885 890 895 

Gly Asn Leu Thr Val Glu Ser Asn Ala Asn Phe Lys Ala lie Thr Asn 
900 905 910 

Phe Thr Phe Asn Val Gly Gly Leu Phe Asp Asn Lys Gly Asn Ser Asn 
915 920 925 

He Ser lie Ala Lys Gly Gly Ala Arg Phe Lys Asp He Asp Asn Ser 
330 935 

Lys^ Asn Leu Ser He Thr Thr Asn Ser Ser Ser Thr Tyr Arg Thr He 

550 955 

He Ser Gly Asn He Thr Asn Lys Asn Gly Asp Leu Asn He Thr Asn 

970 

Glu Gly Ser Asp Thr Glu Met Gin He Gly Gly Asp Val Ser Gin Lvs 
980 985 990 

Glu Gly Asn Leu Thr He Ser Ser Asp Lys He Asn He Thr Lys Gin 
995 1000 1005 

I^re-Thx Ile-Lys Ala Gly Val Asp Gly Glu Asn Ser Asp Ser Asp Ala 
1010 1015 1020 

Thr Asn Asn Ala A^n Leu Thr He Lys Thr Lys Glu Leu Lys Leu Thr 
^025 1030 1035 1040 

Gin Asp Leu Asn He Ser Gly Phe Asn Lys Ala Glu He Thr Ala Lys 
1045 1050 1055 

Asp Gly Ser Asp Leu Thr He Gly Asn Thr Asn Ser Ala Asp Gly Thr 
1060 1065 1070 

Asn Ala Lys Lys Val Thr Phe Asn Gin Val Lys Asp Ser Lys He Ser 
1075 1080 1085 

Ala Asp Gly His Lys Val Thr Leu His Ser Lys Val Glu Thr Ser Glv 
3.090 1095 1100 

f f Ac^^ Asn Asn Ala Gly Leu Thr 

1105 iiio 1X15 1120 

He Asp Ala Lys Asn Val Thr Val Asn Asn Asn He Thr Ser His Lys 
1125 1130 X135 

Ala Val Ser He Ser Ala Thr Ser Gly Glu He Thr Thr Lys Thr Glv 
11^0 1145 1150 

Thr Thr He Asn Ala Thr Thr Gly Asn Val Glu He Thr Ala Gin Thr 
1155 1160 ii€5 

Gly Ser He Leu Gly Gly He Glu Ser Ser Ser Gly Ser Val Thr Leu 
1170 1175 1180 



BNS0OCID:<WO 9736914A1 i > 



wo 97/36914 



PCT/US97/04707 



70 



Thr Ala Thr Glu Gly Ala Leu Ala Val Ser Asn lie Ser Gly Asn Thr 

^^^0 1195 ^200 

Val Thr Val Thr Ala Asn Ser Gly Ala Leu Thr Thr Leu Ala Gly Ser 
^205 1210 121S 

Thr He Lys Gly Thr Glu Ser Val Thr Thr Ser Ser Gin Ser Gly Asp 
1220 1225 1230 ^ 

He Gly Gly Thr lie Ser Gly Gly Thr Val Glu Val Lys Ala Thr Glu 
1235 1240 1245 

^^"^ itcn'^*''' '^^'^ ^^"^ Thr Thr Gly Glu 

1250 1255 



1260 



Ala Asn Val Thr Ser Ala Thr Gly Thr He Gly Gly Thr He Ser Gly 
^^^^ ^270 1275 1280 

Asn Thr Val Asn Val Thr Ala Asn Ala Gly Asp Leu Thr Val Glv Asn 
1285 X290 1295 

Gly Ala Glu He Asn Ala Thr Glu Gly Ala Ala Thr Leu Thr Thr Ser 
1300 1305 1310 

Ser Gly Lys Leu Thr Thr Glu Ala Ser Ser His He Thr Ser Ala Lvs 
1315 1320 1325 

Gly Gin Val Ash Leu Ser Ala Gin Asp Gly Ser Val Ala Gly Ser He 
1330 1335 1340 

Asn Ala Ala Asn Val Thr Leu Asn Thr Thr aiy Thr Leu Thr Thr Val 
1345 1350 1355 136O 

Lys Gly Ser Asn He Aan Ala Thr Ser Gly Thr Leu Val He Asn Ala 
1365 1370 1375 

Lys Asp Ala Glu Leu Asn Gly Ala Ala Leu Gly Asn His Thr Val Val 
1380 1385 1390 

Asn Ala Thr Asn Ala Asn Gly ser Gly Ser Val He Ala Thr Thr Ser 
1395 1400 1405 

Ser Arg Val Asn He Thr Gly Asp Leu He Thr He Asn Gly Leu Asn 
1410 1415 1420 

He He Ser Lys Asn Gly He Asn Thr Val Leu Leu Lys Gly Val Lvs 
^*2S 1430 1435 1440 

He Asp Val Lys Tyr He Gin Pro Gly He Ala fi^r Vi»i iicr> 
1445 1450 

He Glu Ala Lys Arg He Leu Glu Lys Val L 
1460 1465 

Glu Arg Glu Ala Leu Ala Lys Leu Gly Val S 
147S 1480 

Glu Pro Asn Asn Thr He Thr Val Asp Thr G 
1490 1495 

Arg Pro Leu Ser Arg He Val He Ser Glu G_^ 
1505 1510 1515 

Asn Ser Asp Gly Ala Thr Val Cys Val Asn I 
1S2S 1530 



Ser 


Val 


Asp 


Glu Val 
1455 


Asp 


Leu 


Ser Asp 
1470 


Glu 


Ala 


Val Arg 
1485 


Phe 


He 


Aan Glu 
1500 


Phe 


Ala 


Thr 


Arg 


Ala 


Cys 


Phe 


Ser 
XS20 


Ala 


Asp 


Asn 


Gly Arg 

1535 
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(2) INFORMATION FOR SEQ ID NO: 3: 

<i) SEQlXfeNCE CHARACTERISTICS: 

(A) LENGTH: 4 93 7 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TAAATATACA AGATAATAAA AATAAATCAA GATTTTTGTG 

CACCTTTTTT GCAGTCTATA TGCAAATATT TTAAAAAAAT 

AATGGTATAA TCTTTCATCT TTCATCTTTA ATCTTTCATC 

CTTTCATCTT TCATCTTTCA TCTTTCATCT TTCATCTTTC 

CACATOAAAT GATGAACCGA GGGAAGGGAG GGAGGGGCAA 

GAACGCAAAT GATAAAGTAA TTTAATTGTT CAACTAACCT 

ATATATCGTC TCAAATTCAG CAAACGCCTG AATGCTTTGG 

CGGGGTTGTG ACCATTCCAC AGAAAAAGGC TTCCGCTATG 

CACTTAGCGT TAAAGCCACT TTCCGCTATG TTACTATCTT 

CAATCTGTTT TAGCAAGCGG CTTACAAOGA ATOGATGTAG 

CAAGTAGATG GTAATAAAAC CATTATCCQC AACAGTGTTX3 

CAATTTAACA TCGACCAAAA TGAAATQGTG CAGTTTTTAC 

GTATTCAACC GTGTTACATC TAACCAAATC TCCCAATTAA 

GGACAAGTCT TTTTAATCAA CCCAAATGGT ATCACAATAG 

ACTAATGGCT TTACGGCTTC TACGCTAGAC ATTTCTAACG 

TTCACCTTCG AGCAAACCAA AGATAAAGCG CTCGCTGAAA 

ACTGTCGGTA AAGACGOCAG TGTAAATCTT ATTGOTGOCA 

ATTAGCGTAA ATGGTOQCAG CATTTCTTTA CTCGCAQGGC 

ATAATAAACC CAACCATTAC TTACAGCATT GCCGOGCCTG 

GGCGATATTT TTGCCAAAGG CGGTAACATT AATGTCCOTX3 

GGTAAACTTT CTGCTGATTC TGTAAGCAAA GATAAAAGOG 

AAAGAGGGTG AAGCGGAAAT TGGCGOTGTA ATTTOCOCTC 

GGCAAGCTGA TGATTACAGG CGATAAAOTC ACATTAAAAA 

TCAGGTAAAG AAGGGGGAOA AACTTACCTT GOCGGTGACG 

GGCATTCAAT TAGCAAAGAA AACCTCTTTA GAAAAAGGCT 

AAAGAAAAAG GCGGACGCGC TATTGTGTGG GGCGATATTG 

AACGCTCAAG GTAGTGGTGA TATCGCTAAA ACCGGTG G TT 

TATTTATCCA TTGACAGCAA TGCAATTGTT AAAACAAAAG 



ATGACAAACA 
AGTATAAATC 
TTTCATCTTT 
ATCTTTCATC 
GAATGAAGAG 
TAGGAGAAAA 
TTGCTGTGTC 
TTACTATCTT 
TAGGTGTAAC 
TACACGGCAC 
ACGCTATCAT 
AAGAAAACAA 
AAGGGATTTT 
GTAAAGACGC 
AAAACATCAA 
TTQTGAATCA 
AAGTGAAAAA 
AAAAAATCAC 
AAAATGAAGC 
CTGCCACTAT 
GCAATATTGT 
AAAATCAGCA 
CAGGTGCAGT 
AGCGCGGCGA 
CAACCATCAA 
CGTTAATTGA 
TTGTGGAGAC 
AGTGGTTGCT 



ACAATTACAA 
CGCCATATAA 
CATCTTTCAT 
TTTCATCTTT 
GGAGCTGAAC 
TATGAACAAG 
TGAATTGGCA 
TAGGTGTAAC 
ATCTATTCCA 
AGCCACTATG 
TAATTGGAAA 
CAACTCCGCC 
AGATTCTAAC 
AATTATTAAC 
GGCGCGTAAT 
CGGTTTAATT 
CGAGGGTGTG 
CATCAGCGAT 
GCTCAATCTO 
TCGAAACCAA 
TCTTTCCGCC 
AGCTAAAOGC 
TATCGACCTT 
AGGTAAAAAC 
TGTATCAGGC 
CGGCAATATT 
ATCGGGGCAT 
AGACCCTGAT 



€0 
120 
180 
240 
300 
360 
420 
460 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
il40 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
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GATGTAACAA 


TTGAW3CCGA 


AGACCCCCTT 


CGCAATAATA 


CCGGTATAAA 


TGATGAATTC 


1740 


CCAACAGGCA 


CCGGTGAAGC 


AAGCGACCCT 


AAAAAAAATA 


GCGAACTCAA 


AACAACGCTA 


1800 


ACCAATACAA 


CTATTTCAAA 


TTATCTGAAA 


AACGCCTGGA 


CAATGAATAT 


AACGGCATCA 


I860 


AGAAAACTTA 


CCGTTAATAG 


CTCAATCAAC 


ATCGGAAGCA 


ACTCCCACTT 


AATTCTCCAT 


1920 


AGTAAIVGGTC 


AGCGTGGCGG 


AGGCGTTCAG 


ATTGATGGAG 


ATATTACTTC 


TAAAGGCGGA 


1980 


AATTTAACCA 


TTTATTCTGG 


CGGATGGGTT 


GATGTTCATA 


AAAATATTAC 


GCTTGATCAG 


2040 


GGTTTTTTAA 


ATATTACCGC 


CGCTTCCGTA 


GCTTTTGAAG 


GTGGAAATAA 


CAAAGCACGC 


2100 


GACGCGGCAA 


ATGCTAAAAT 


TGTCGCCCAG 


GGCACTGTAA 


CCATTACAGG 


AGAGGGAAAA 


2160 


GATTTCAGGG 


CTAACAACGT 


ATCTTTAAAC 


GGAACGGGTA 


AAGGTCTGAA 


TATCATTTCA 


2220 


TCAGTGAATA 


ATTTAACCCA 


CAATCTTAGT 


GGCACAA.TTA 


ACATATCTGG 


GAATATAACA 


2280 


ATTAACCAAA 


CTACGAGAAA. 


GAACACCTCG 


TATTGGCAAA 


CCAGCCATGA 


TTCGCACTGG 


2340 


AACGTCAGTG 


CTCT*r AATCT 


AGAGACAGGC 


GCAAATTTTA 


CCTTTATTAA 


ATACATTTCA 


2400 


AGGAATAGCA 


AAGGCTTAAC 


AACACAGTAT 


AGAAOCTCTG 


CAQGGGTGAA 


TTTTAACQGC 

AAA A^^^%%*>V»Vv 


2460 


GTAAATGGCA 


ACATGTCATT 


CAATCTCAAA 


GAAGGAGCGA 


AAGTTAATTT 


CAAATTAAAA 


A 9 A W 


CCAAACGAGA 


ACATGAACAC 


AAGCAAACCT 


TTACCAATTC 


GGTTTTTAGC 


CAATATCACA 


2S80 


GCCACTGGTG 


GGGGCTCTGT 


TTTITTTGAT 


ATATATGCCA 


ACCATTCTGG 


CAGAGGGGCT 


2640 


GAGTTAAAAA 


TGAGTGAAAT 


TAATATCTCT 


AACGGCGCTA 


ATTTTACCTT 


AAATTCCCAT 


2700 


GTTCGCGGCG 


ATGACGCTTT 


TAAAATGAAC 


AAAOACTTAA 


CCATAAATGC 


AACCAATTCA 


2760 


AATTTCAGCC 


TCAGACAGAC 


GAAAGATGAT 


TTTTATQACG 


GGTACGCACX3 


CAATGCCATC 


2820 


AATTCAACCT 


ACAACATATC 


CATTCTGGGC 


GGTAATGTCA 


CCCTTGGTGG 


ACAAAACTCA 


2880 


AGCAGCAGCA 


TTACGGGGAA 


TATTACTATC 


GAGAAAGCAG 


CAAATGTTAC 


GCTAGAAGCC 


2940 


AATAACGCCC 


CTAATCAGCA 


AAACATAAGG 


GATAGAGTTA 


TAAAACTTGG 


CAGCTTGCTC 


3000 


GTTAATGGGA 


GTTTAAGTTT 


AACTGGCGAA 


AATGCAGATA 


TTAAAGGCAA 


TCTCACTATT 


3060 


TCAGAAAGCG 


CCACTTTTAA 


AGGAAAGACT 


AGAGATACCC 


TAAAXATCAC 


CGGCAATTTT 


3120 


ACCAATAATG 


GCACTGCCOA 


AATTAATATA 


ACACAAGGAG 


TGGTAAAACT 


TQGCAATGTT 


3X80 


ACCAATGATG 


GTGATTTAAA 


CATTACCACT 


CACGCTAAAC 


GCAACCAAAG 


AAGCATCATC 


3240 


GGCGQAGATA 


TAATCAACAA 


AAAAGGAAGC 


TTAAATATTA 


CAGACAGTAA 


TAATGATGCT 


3300 


GAAATCCAAA 


TTGGCGGCAA 


TATCTCGCAA 


AAAGAAGGCA 


ACCTCACGAT 


TTCTTCCGAT 


3360 


AAAATTAATA 


tCACXZAAACA 


GATAACAATC 


AAAAAGGGTA 


TTGATGGAGA 


GGACTCTAGT 


3420 


TCAGATGCGA 


CAAGTAATGC 


CAACXrTAACT 


ATTAAAACCA 


AAGAATTGAA 


ATTGACAGAA 


3480 


GACCTAASTA 


TTTCAGGTTT 


CAATAAAGCA 


GAGATTACAG 


CCAAAGATGG 


TAGAGATTTA 


3540 


ACTATTGGCA 


ACAGTAATGA 


CGGTAACAGC 


GGTGCCGAAG 


CCAAAACAGT 


AACrriTAAC 


3600 


AATGTTAAAG 


ATTCAAAAAT 


CTCTGCTGAC 


GGTCACAATG 


TGACACTAAA 


TAGCAAAGTG 


3660 


AAAACATCTA 


GCAGCAATGG 


CGGACGTGAA 


AGCAATAGCG 


ACAACGATAC 


CGGCTTAACT 


3720 
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ATTACTGCAA AAAATGTAGA AGTAAACAAA GATATTACTT CTCTCAAAAC AGTAAATATC 3 780 

ACCGCGTCGG AAAAGGTTAC CACCACAGCA GGCTCGACCA TTAACGCAAC AAATGGCAAA 384 0 

GCAAGTATTA CAACCAAAAC AGGTGATATC AGCGGTACGA TTTCCGGTAA CACGGTAAGT 3 900 

GTTAGCGCGA CTGGTGATTT AACCACTAAA TCCGGCTCAA AAATTGAAGC GAAATCGGGT 
GAGGCTAATG TAACAAGTGC AACAGGTACA ATTGGCGGTA CAATTTCCGG TAATACGGTA 
AATGTTACGG CAAACGCTGG CGATTTAACA GTTGGGAATG GCGCAGAAAT TAATGCGACA 
GAAGGAGCTG CAACCTTAAC CGCAACAGGG AATACCTTGA CTACTGAAGC CGGTTCTAGC 
ATCACTTCAA CTAAGGGTCA GGTAGACCTC TTGGCTCAGA ATGGTAGCAT CGCAGGAAGC 4200 
ATTAATGCTG CTAATGTGAC ATTAAATACT ACAGGCACCT TAACCACCGT GGCAGGCTCG 4260 
GATATTAAAG CAACCAGCGG CACCTTGGTT ATTAACGCAA AAGATCCTAA GCTAAATGGT 
GATGCATCAG GTGATAGTAC AGAAGTGAAT GCAGTCAACG CAAGCGGCTC TGGTAGTGTG 
ACTGCGGCAA CCTCAAGCAG TGTGAATATC ACTGGGGATT TAAACACAGT AAATGGGTTA 
AATATCATTT CGAAAGATGG TAGAAACACT GTGCGCTTAA GAGGCAAGGA AATTGAGGTG 
AAATATATCC AGCCAGGTGT AGCAAGTGTA GAAGAAGTAA TTGAAGCGAA ACGCGTCCTT 
GAAAAAGTAA AAGATTTATC TGATGAAGAA AGAQAAACAT TAGCTAAACT TGGTGTAAGT 
GCTGTACGTT TTGTTGAGCC AAATAATACA ATTACAGTCA ATACACAAAA TGAATTTACA 
ACCAGACCGT CAAGTCAAGT GATAATTTCT GAAGGTAAGG CGTGTTTCTC AAGTGGTAAT 
GGCGCACGAG TATGTACCAA TGTTGCTGAC GATGGACAGC CGTAGTCAGT AATTGACAAG 
GTAGATTTCA TCCTGCAATG AAGTCArTTT ATTITCGTAT TATTTACTGT GTGGGTTAAA 
GTTCAGTACG GGCTTTACCC ATCTTGTAAA AAATTACGGA GAATACAATA AAGTATTTTT 
AACAGGTTAT TATTATG 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1477 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
(O) TOPOLOQY« linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Asn Lys He Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
^5 io IS 

Val Ala Val Ser Glu Leu Ala Arg Gly Cys Asp His Ser Thr Glu Lvs 
20 25 30 

Gly Ser Glu Lys Pro Ala Arg Met Lys Val Arg His Leu Ala Leu Lvs 
35 40 45 

Pro Leu Ser Ala Met Leu Leu Ser Leu Gly Val Thr Ser He Pro Gin 
50 55 



3960 
4020 
4080 
4140 



4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
41360 
4920 
4937 
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Ser Val Leu Ala Ser Gly Leu Gin Gly Met Asp Val Val His Gly Thr 
65 70 75 80 

Ala Thr Met Gin Val Asp Gly Asn Lys Thr lie lie Arg Asn Ser Val 
85 90 95 

Asp Ala lie lie Asn Trp Lys Gin Phe Asn lie Asp Gin Asn Glu Met 
100 105 110 

Val Gin Phe Leu Gin Glu Asn Asn Asn Ser Ala Val Phe Asn Arg Val 
lis 120 125 

Thr Ser Asn Gin lie Ser Gin Leu Lys Gly lie Leu Asp Ser Asn Gly 
130 135 140 

Gin Val Phe Leu lie Asn Pro Asn Gly He Thr He Gly Lys Asp Ala 
145 ISO 155 160 

He He Asn Thr Asn Gly Phe Thr Ala Ser Thr Leu Asp He Ser Asn 
165 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr Phe Glu Gin Thr Lys Asp Lys 
180 185 190 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
195 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys He Thr 
225 230 235 240 

He Ser Asp He lie Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala Val Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr He Arg Asn Gin Gly Lys Leu Ser Ala 
275 280 285 ' 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lys 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 
305 310 315 320 

Ala Lys Gly Gly Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 

Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tyr 
340 345 350 

Leu Gly Gly Asp Olu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 365 

Lys Lys Thr Ser I*eu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lys 
370 375 380 

Glu Lys Gly Gly Phe Ala He Val Trp Gly Asp He Ala Leu He Asp 
385 390 395 400 

Gly Asn He Asn Ala Gin Gly Ser Gly Asp He Ala Lys Thr Gly Gly 
405 410 415 
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Phe Val Glu Thr Ser Gly His Asp Leu Phe lie Lys Asp Asn Ala lie 
*20 425 430 

val Asp Ala Lys Glu Trp Leu heu Asp Phe Asp Asn Val Ser He Asn 
435 440 445 

Ala Glu Asp Pro Leu Phe Asn Asn Thr Gly He Asn Asp Glu Phe Pro 
*50 455 

Thr Gly Thr Gly Glu Ala Ser Asp Pro Lys Lys Asn Ser Glu Leu Lys 

475 480 

Thr Thr Leu Thr Asn Thr Thr He Ser Asn Tyr Leu Lys Asn Ala Trp 

490 495 

Thr Met Asn He Thr Ala Ser Arg Lys Leu Thr Val Asn Ser Ser He 
500 505 510 

Asn He Gly Ser Asn Ser His Leu He Leu His Ser Lys Gly Gin Arg 

520 525 

Gly Gly Gly Val Gin He Asp Gly Asp He Thr Ser Lys Gly Gly Asn 

535 540 

Leu Thr. He Tyr Ser Gly Gly Trp Val Asp Val His Lys Asn lie Thr 

550 555 

Leu Asp Gin Gly Phe Leu Asn He Thr Ala Ala Ser Val Ala Phe Glu 
S6S 570 

Gly Gly Asn Asn Lys Ala Arg Asp Ala Ala Asn Ala Lys He Val Ala 
580 585 _ 590 

-Gln-Gly-Thr- Val-Thr-He Thr Gly Glu Gly Lys Asp Phe Axg Ala Asn 

600 605 

Asn Val Ser Leu Asn Gly Thr Gly Lys Gly Leu Asn He He Ser Ser 

615 620 

val Asn Asn Leu Thr His Asn Leu Ser Gly Thr He Asn He Ser Gly 

«30 635 640 

Asn He Thr He Asn Gin Thr Thr Arg Lys Asn Thr Ser TVr Trp Gin 

650 655 

Thr ser His Asp Ser His Trp Asn Val Ser Ala Leu Asn Leu Qlu Thr 
««0 665 670 

Gly Ala Asn Phe Thr Phe He Lys Tyr He Ser Ser Asn Ser Lys Gly 
*75 680 685 

Leu Thr Thr Gin Tyr Arg Ser Ser Ala Gly Val Asn Phe Asn Gly Val 

695 700 

Asn Gly Asn Met Ser Phe Asn Leu Lys Glu Gly Ala Lys Val Asn Phe 

715 720 

Lys Leu Lys Pro Asn Glu Asn Met Asn Thr Ser Lys Pro Leu Pro He 
'25 730 735 

Arg Phe Leu Ala Asn He Thr Ala Thr Gly Gly Gly Ser Val Phe Phe 



745 750 

Leu 
765 



Asp lie Tyr Ala Asn Hie Ser Gly Arg Gly Ala Glu Leu Lya Met Ser 
755 760 
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Glu lie Asn lie Ser Asn Gly Ala Asn Phe Thr Leu Asn Ser His Val 
770 775 780 

Axg Gly Asp Asp Ala Phe Lys lie Asn Lys Asp Leu Thr He Asn Ala 
785 790 795 800 

Thr Asn Ser Asn Phe Ser Leu Arg Gin Thr Lys Asp Asp Phe Tyr Asp 
805 810 815 

Gly Tyr Ala Arg Asn Ala He Asn Ser Thr Tyr Asn He Ser He Leu 
820 825 830 

Gly Gly Asn Val Thr Leu Gly Gly Gin Asn Ser Ser Ser Ser He Thr 
835 840 845 

Gly Asn He Thr He Glu Lys Ala Ala Asn Val Thr Leu Glu Ala Asn 
850 855 860 

Asn Ala Pro Asn Gin Gin Asn He Arg Asp Arg Val He Lys Leu Gly 
865 870 875 880 

Ser Leu Leu Val Asn Gly Ser Leu Ser Leu Thr Gly Glu Asn Ala Asp 
885 890 895 

He Lys Gly Asn Leu Thr He Ser Glu Ser Ala Thr Phe Lys Gly Lys 
900 905 910 

Thr Arg Asp Thr Leu Asn He Thr Gly Asn Phe Thr Asn Asn Gly Thr 
915 920 925 

Ala Glu He Asn He Thr Gin Gly Val Val Lys Leu Gly Asn Val Thr 
930 935 • 940 

As n Asp Gl y Asp Leu Asn He Thr Thr His Ala Lys Arg Aan Gin Arg 
945 950 955 960 

Ser He He Gly Gly Asp He He Asn Lys Lys Gly Ser Leu Asn He 
965 970 975 

Thr Asp Ser Asn Asn Asp Ala Glu He Gin He Gly Gly Asn He Ser 
980 985 990 

Gin Lys Glu Gly Asn Leu Thr He Ser Ser Asp Lys He Asn He Thr 
995 1000 1005 

Lys Gin He Thr He Lys Lys Gly He Asp Gly Glu Asp Ser Ser Ser 
1010 1015 1020 

Asp Ala Thr Ser A^n Ala Asn Leu Thr He Lys Thr Lys Glu Leu Lys 
1025 1030 1035 1040 

Leu Thr Glu Asp Leu Ser He Ser Gly Phe Asn Lys Ala Glu He Thr 
1045 1050 1055 

Ala Lys Asp Gly Arg Asp Leu Thr He Gly Asn Ser Asn Asp Gly Asn 
1060 . 1065 1070 

Ser Gly Ala Glu Ala Lys Thr Val Thr Phe Asn Asn Val Lys Asp Ser 
1075 lOeO 1085 

Lys He Ser Ala Asp Gly His Asn Val Thr Leu Asn Ser Lys Val Lys 
1090 1095 1100 

Thr Ser Ser Ser Asn Gly Gly Arg Glu Ser Asn Ser Asp Asn Asp Tlir 
1105 1110 1115 1X20 
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Gly Leu Thr lie Thr Ala Lys Asn Val Glu Val Asn Lys Asp lie Thr 
1125 1130 1135 

Ser Leu Lys Thr Val Asn He Thr Ala Ser Glu Lys Val Thr Thr Thr 
11*0 1145 iiso 

Ala Gly Ser Thr lie Asn Ala Thr Asn Gly Lys Ala Ser He Thr Thr 
1155 1160 1165 

'^P ^^'^ "^^^ lie Se»^ Gly Asn Thr Val Ser Val 

11'° 1175 1180 

fffc^* "^^"^ "^^^ '^^'^ Ser Gly Ser Lys lie Glu Ala 

11^5 IISO 1195 1200 

Lys Ser Gly Glu Ala Asn Val Thr Ser Ala Thr Gly Thr He Gly Gly 
1205 1210 1215 

Thr He Ser Gly Asn Thr Val Asn Val Thr Ala Asn Ala Gly Asp Leu 

X220 1225 



1230 



Thr val Gly Asn Gly Ala Glu lie Asn Ala Thr Glu Gly Ala Ala Thr 
123= 1240 1245 

Gly Ser Ser He 

1255 1260 

Thr Ser Thr Lys Gly Gin Val Asp Leu Leu Ala Gin Asn Gly Ser lie 

"70 1275 1280 

Ala Gly Ser He Asn Ala Ala Asn Val Thr Leu Asn Thr Thr Gly Thr 
1285 1290 1295 

~Leu-Thr Thr Val Ala Gly Ser Asp lie Lys Ala Thi- Ser Gly Thr Leu 
"00 1305 1310 

Val He Asn Ala Lys Asp Ala Lys Leu Asn Gly Asp Ala Ser Gly Asp 
1315 1320 1325 

^^'^ T^n^^"* Ser Gly Ser Gly Ser Val Thr 

"30 1335 1340 

Ala_Ala Thr Ser Ser Ser Val Asn He Thr Gly Asp Leu Asn Thr Val 

"SO 1355 ^3gQ 

Asn Gly Leu Asn He He Ser Lys Asp Gly Arg Asn Thr Val Arg Leu 
"65 1370 1375 

Arg Gly Lys Glu lie Glu Val Lys Tyr He Gin Pro Gly Val Ala Ser 
1*80 . 1385 1390 

Val Glu Olu val He Glu Ala Lys Arg Val Leu Glu Lys Val Lys Asp 
1395 1400 1405 

fffo'^P ^ ^« Qly Val Ser Ala 

1*1" 141S 1420 

Y^ic*^ ***** Thr Val Asn Thr Gin Asn 

1*30 1435 X440 

Glu Phe Thr Thr Arg Pro Ser Ser Gin Val He He Ser Glu Gly Lvs 
1**5 1450 14SS 

Ala Cys Phe Ser Ser Gly Asn Gly Ala Arg Val Cys Thr Asn Val Ala 
1*60 1465 1470 
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Asp Asp Gly Gin Pro 
1475 

(2) INFORMATION FOR SEQ ID NO:S: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



ACAGCGTTCT 


CTTAATACTA 


GTACAAACCC 


ACAATAAAAT 


ATGACAAACA 


ACAATTACAA 


60 


CACCTTTTTT 


GCAGTCTATA 


TGCAAATATT 


TTAAAAAATA 


GTATAAATCC 


GCCATATAAA 


120 


ATGGTATAAT 


ctttcatctt 


TCATCTTTCA 


TCTTTCATCT 


TTCATCTTTC 


ATCTTTCATC 


180 


TTTCATCTTT 


CATCTTTCAT 


CTTTCATCTT 


TCATCTTTCA 


TCTTTCATCT 


TTCATCTTTC 


240 


ACATGAAATG 


ATGAACCGAG 


GGAAGGGAGG 


GAGGGGCAAG 


AATGAAGAGG 


GAGCTGAACG 


300 


AACGCAAATG 


ATAAAGTAAT 


TTAATTGTTC 


AACTAACCTT 


AGGAGAAAAT 


ATGAACAAGA 


360 


TATATCGTCT 


CAAATTCAGC 


AAACGCCTGA 


ATGCTTTGGT 


TGCTGTGTCT 


GAATTGGCAC 


420 


GGGGTTGTGA 


CCATTCCACA 


GAAAAAGGCA 


GCGAAAAACC 


TGCTCGCATG 


AAAGTGCGTC 


480 


ACTTAGCGTT 


AAAGCCACTT 


TCCGCTATGT 


TACTATCTTT 


AGGTGTAACA 


TCTATTCCAC 


540 


AATCTGTTTT 


AGCAAGCGGC 


TTACAAGGAA 


TGGATGTAGT 


ACACGGCACA 


GCCACTATGC 


€00 


AAGTAGATGG 


TAATAAAACC 


ATTATCCGCA 


ACAGTGTTGA 


CGCTATCATT 


,AATTGGAAAC , 


660 


AATTTAACAT 


CGACCAAAAT 


GAAATGGTGC 


AGT-rriTACA 


AGAAAACAAC 


AACTCCGCCG 


720 


TATTCAACCG 


TGTTACATCT 


AACCAAATCT 


CCCAATTAAA 


AGGGATTTTA 


GATTCTAACG 


780 


GACAAGTCTT 


TTTAATCAAC 


CCAAATGGTA 


TCACAATAGG 


TAAAGACGCA 


ATTATTAACA 


840 


CTAATGGCTT 


TACGGCTTCT 


ACGCTAGACA 


TTTCTAACGA 


AAACATCAAG 


GCGCGTAATT 


900 


TCACCTTCGA 


GCAAACCAAA 


GATAAAGCGC 


TCGCTGAAAT 


TGTGAATCAC 


GGTTTAATTA 


960 


CTGTCGGTAA 


AGACGGCAGT 


GTAAATCTTA 


TTGGTGGCAA 


AGTGAAAAAC 


GAGGGTGTGA 


1020 


TTAGCGTAAA 


TGGTGGCAGC 


ATTTCTTTAC 


TCGCAGGGCA 


AAAAATCACC 


ATCAGCGATA 


1080 


TAATAAACCC 


AACCATTACT 


TACAGCATTG 


CCGCGCCTGA 


AAATGAAGCG 


GTCAATCTGG 


1140 


GCGATATTTT 


TGCCAAAGGC 


GGTAACATTA 


ATGTCCGTGC 


TGCCACTATT 


CGAAACCAAG 


1200 


CTTTCOGCCA AAGAGGGTGA AGCGGAAATT 


GGCGGTGTAA 


TTTCCGCTCA 


AAATCAGCAA 


1260 


gCtaaaggcg GCAAGCTGAT 


GATTACAGGC 


GATAAAGTCA 


CATTAAAAAC 


AGGTGCAGTT 


1320 


ATCGACCTTT 


CAGGTAAAGA 


AGGGGGAGAA 


ACTTACCTTG GCGGTGACGA GCGCGGCGAA 


1380 


GGTAAAAACG 


GCATTCAATT 


AGCAAAGAAA 


ACCTCTTTAG 


AAAAAGGCTC 


AACCATCAAT 


1440 


GTATCAGGCA 


AAGAAAAAGG 


CGGACGCGCT 


ATTGTGTGGG 


GCGATATTGC 


GTTAATTGAC 


1500 
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GGCAATATTA ACGCTCAAGG TAGTGGTGAT ATCGCTAAAA CCGGTCGTTT TGTGGAGACG 1560 

TCGGGGCATG ATTTATTCAT CAAAGACAAT GCAATTGTTG ACGCCAAAGA GTGGTTGTTA 1620 

GACCCGGATA ATGTATCTAT TAATGCAGAA ACAGCAGGAC GCAGCAATAC TTCAGAAGAC 1680 

GATGAATACA CGGGATCCGG GAATAGTGCC AGCACCCCAA AACGAAACAA AGAAAAGACA 1740 

ACATTAACAA ACACAACTCT TGAGAGTATA CTAAAAAAAG GTACCTTTGT TAACATCACT 1800 

GCTAATCAAC GCATCTATGT CAATAGCTCC ATTAATTTAT CCAATGGCAG CTTAACTCTT 1860 

TGGAGTGAGG GTCGGAGCGG TGGCGGCGTT GAGATTAACA ACGATATTAC CACCGGTGAT 1920 

GATACCAGAG GTGCAAACTT AACAATTTAC TCAGGCX3GCT GGGTTGATGT TCATAAAAAT 1980 

ATCTCACTCG GGGCGCAAGG TAACATAAAC ATTACAGCTA AACAAGATAT CGCCTTTGAG 2040 

AAAGGAAGCA ACCAAGTCAT TACAGGTCAA GGGACTATTA CCTCAGGCAA TCAAAAAGGT 2100 

TTTAGATTTA ATAATGTCTC TCTAAACGGC ACTGGCAGCG GACTGCAATT CACCACTAAA 2160 

AGAACCAATA AATACGCTAT CACAAATAAA TTTGAAGGGA CTTTAAATAT TTCAGGGAAA 2220 

GTGAACATCT CAATGGTTTT ACCTAAAAAT GAAAGTGGAT ATGATAAATT CAAAGGACGC 2280 

ACTTACTGGA ATTTAACCTC GAAAGTGGAT ATGATAAATT CAAAGGACGC CCTCACTATT 2340 

GACTCCAGAG GAAGCGATAG TGCAOGCACA CTTACCCAGC CTTATAATTT AAACGGTATA 2400 

TCATTCAACA AAGACACTAC CTTTAATGTT GAACGAAATG CAAGAGTCAA CTTTGACATC 2460 

AAGGCACCAA TAGGGATAAA TAAGTATTCT AGTTTGAATT ACGCATCATT TAATGGAAAC 2S20 

ATTTCAGTTT CGGGAGGOGG GAGTGTTGAT TTCACACTTC TCGCCTCATC CTCTAACOTC 2S80 

CAAACCCCCG GTGTAGTTAT AAATTCTAAA TACTTTAATG TTTCAACAGG GTCAAGTTTA 2640 

AGATTTAAAA CTTCAGGCTC AACAAAAACT GGCTTCTCAA TAGAGAAAGA TTTAACTTTA 2700 

AATGCCACCG GAGGCAACAT AACACTTTTG CAAGTTGAAG GCACCGATGG AATtSATTGGT 2760 

AAAGGCATTG TAGCCAAAAA AAACATAACC TTTGAAGGAG GTAAGATGAG GTTTGGCTCC 2820 

AGGAAAGCCO TAACAGAAAT CGAAGGCAAT GTTACTATCA ATAACAACGC TAACGTCACT 2880 

CTTATCGGTT CGGATrTTGA CAACCATCAA AAACCTTTAA CTATTAAAAA AGATGTCATC 2940 

ATTAATAGCG GCAACCTTAC CGCTGGAGGC AATATTGTCA ATATAGCCGG AAATCTTACC 3000 

GTTGAAAGTA ACGCTAATTT CAAAGCTATC ACAAATTTCA CTTTTAATGT AGGCGGCTTG 3060 

TTTGACAACA AAGGCAATTC AAATATTTCC ATTGCCAAAG GAGGGGCTCG CTTTAAAGAC 3120 

ATTGATAATT CCAAGAATTT AAGCATCACC ACCAACTCCA GCTCCACTTA CCGCACTATT 3180 

ATAAOCGGCA ATATAACCAA TAAAAACOGT GATTTAAATA TTACGAACGA AOGTAGTCAT 3240 

ACTGAAATGC AAATTGGCGG CGATGTCTCG CAAAAAOAAG GTAATCTCAC GATTTCTTCT 3300 

GACAAAATCA ATATTACCAA ACAGATAACA ATCAAGGCAG GTGTTGATGG GGAGAATTCC 3360 

GATTCAGACG CGACAAACAA TGCCAATCTA ACCATTAAAA CCAAAGAATT GAAATTAACG 3420 

CAAGACCTAA ATATTTCAGG TTTCAATAAA GCAGAGATTA CAGCTAAAGA TGGTAGTGAT 3480 

TTAACTATTG GTAACACCAA TAGTGCTGAT GGTACTAATG CCAAAAAAGT AACCTTTAAC 354 0 
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CAGGTTAAAG 
GAAACATCCG 
ATCGATGCAA 
TCTGCGACAA 
AACGTGGAGA 
TCTGTAACAC 
GTTACTGTTA 
ACCGAGAGTG 
ACAGTAGAGG 
ACAACAGGCG 
AATACGGTAA 
AATGCGACAG 
AGTTCACACA 
GCAGGAAGTA 
AAGGGTTCAA 
CTAAATGGCG 
GGCAGCGTAA 
AATGGATTAA 
ATTGATGTGA 
CGCATCCTTG 
GGCGTAAGTG 
GAATTTGCAA 
AACAGTGATG 
ATTGACAAGO 
TGGGTTAAAG 
AGTArrTTTA 
TATCAGTATT 
AAGGCTTTCA 
CAAAATCTTT 
TTGAATTACA 
AACAAACCAT 
GCCAAGTTTT 
CATCTTTGAA 
TCAATATGGC 



ATTCAAAAAT 
GTAGTAATAA 
AAAATGTAAC 
GTGGAGAAAT 
TAACCGCTCA 
TTACTGCAAC 
CTGCAAATAG 
TAACCACTTC 
TTAAAGCAAC 

aggctaacgt 
atgttacx;gc 

AAGGAGCTGC 
TTACTTCAGC 
TTAATGCCGC 
ACATTAATGC 
CAGCATTGGG 
TCGCGACAAC 
ATATCATTTC 
AATACATTCA 
AGAAGGTAAA 
CTGTACGTTT 
CCAGACCATT 
GCGCQACGGT 
TAGATTTCAT 
TTCAQTACGG 
ACAGGTTATT 
GCTTGGCCTG 
GTTATCTGGT 
ATCTAAATAC 
GGCTGTGCTA 
TACX3GATGGC 
TTATAAGGCG 
ACAAGGAAAA 
AAAAGAAAAT 



CTCTGCTGAC 
CAACACTGAA 
AGTAAACAAC 
TACCACTAAA 
AACAGGTAGT 
CGAGGGCGCT 
CGGTGCATTA 
AAGTCAATCA 
CGAAAGTTTA 
AACAAGTGCA 
AAACXSCTGOC 
AACCTTAACT 
CAAGGGTCAG 
CAATGTOACA 
AACCAGCGGT 
TAACCACACA 
CTCAAGCAQA 
AAAAAACGGT 
ACCGGGTATA 
AGATTTATCT 
TATTGAGCCA 
AAGTCGAATA 
GTGCGTTAAT 
CCTGCAATGA 
GCTTTACCCA 
ATTATGAAAA 
GCTTCTTCAT 
GCACTTGAAA 
CAAGaCTCOC 
GATAAGATTG 
AATATTATGT 
AGCCAGGGTT 
GTGTATGAAG 
CCACTTAAAG 



GGTCACAAGG TGACACTACA CAGCAAAGTG 3600 

GATAGCAGTG ACAATAATGC CXXXTTTAACT 3660 

AATATTACTT CTCACAAAGC AGTGAGCATC 3720 

ACAGGTACAA CCATTAACX3C AACCACTGGT 3700 

ATCCTAGGTG GAATTGAGTC CAGCTCTGGC 3040 

CTTGCTGTAA GCAATATTTC GGGCAACACC 3900 

ACCACTTTGG CAGGCTCTAC AATTAAAGGA 3960 

GGCGATATCG GCGGTACGAT TTCTGGTGGC 4020 

ACCACTCAAT CCAATTCAAA AATTAAAGCA 4080 

ACAGGTACAA TTGGTGGTAC GATTTCCGGT 4140 

OATTTAACAG TTGGGAATGQ CGCAGAAATT 4200 

ACATCATCGG GCAAATTAAC TACCGAAGCT 426 0 

GTAAATCTTT CAGCTCAGGA TGGTAGCGTT 4320 

CTAAATACTA CAGGCACTTT AACTACCGTG 4380 

ACCTTGGTTA TTAACGCAAA AGACGCTGAG 4440 

GTGGTAAATG CAACCAACGC AAATGGCTCC 4500 

GTGAACATCA CTGGGGATTT AATCACAATA 4560 

ATAAACACCG TACTGTTAAA AGGCGTTAAA ~ 4620 

GCAAGCGTAG ATGAAGTAAT TGAAGCQAAA 4680 

GATGAAGAAA GAGAAGCGTT AGCTAAACTT 4740 

AATAATACAA TTACAGTCGA TACACAAAAT 4800 

GTGATTTCTG AAGGCAGGGC GTGTTTCTCA 4860 

ATCGCTQATA ACGGGCGQTA GOGGTCAGTA 4920 

AGTCATTTTA TTTTCGTATT ATTTACTQTG 4980 

TCTTGTAAAA AATTACGGAG AATACAATAA 5040 

ATATAAAAAG CAGATTAAAA CTCAGTGCAA 5100 

CATTGTATGC AOAAGAAGCG rTTTTAGTAA 5160 

CTTTAAGTGA AGACGCCCAA CTGTCTGTAG 5220 

AAACTTTAAC AAACCTAAAA ACAGCACAGC 5280 

AGCCAAATAA GTTTGATGTG ATATTGCCAC 5340 

TTGAGCTAGT CTCGAAATCA GCCGCAGAAA 5400 

ATAGTGAAGA AAATATCGCT CGTAGCCTGC 5460 

ATGGTCGTCA GTGGTTCGAT TTGCGTGAAT 5520 

TCACTCGCGT GCATTACGAG TTAAACCCTA 5580 
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AAAACAAAAC 
TTGTTTCCTA 
TTGTAAATGC 
TAAAAGCACC 
AACACCAATC 
GCTTACCAAG 
AATGGAGTTA 
TAGGCTACAA 
AGAAAAAATT 
CTAAAACAAT 
CTTTTGGAAT 
GTTTAGGGTT 
AGTTTACTCT 
TCAGAGGCTT 
TAAGTATGCC 
AC3TTCCGTTA 
CTOCOGGTTT 
CTCGTGGCTT- 
CACCTACAAC 
GGTAAGCGTT 
ACGCAACCCT 
AAACCAAGCA 
AGCAAACCAA 
TGATAAACTA 
GAAAATTTAC 
CAAACTTCCC 
CATGTCGCCA 
ATGGACGCTA 
TATCTACCCX5 
CTCTTTTCCG 
CAACGCTGGT 
AATAAATATA 
TCTATTGCTA 
GCGTTATGGG 



CTCTGATTTG 
TGATAATTTC 
CAATTTGACC 
ATCAAAATCT 
CTTAAGTCTT 
TGCGATTAAT 
TTATCTCCCG 
CTACCGCCAT 
TGCAGTATCA 
CTTTAATATT 
GGAGCGCATT 
GAGTCAAGAG 
ACAAGATATA 
TAAATACGGC 
AAAATACACC 
TAATAGCGAA 
AGGCATTAAA 
-TGGAAATGCC 
CTTCTGGGGT 
CCGCCTACCA 
GTTTTCATCC 
AACCAAGCAA 
QCAAACCAAG 
AAACATACTC 
AAAGTGTTCC 
TGCAAATACT 
AAAAAGATTA 
ATTTTGGAGG 
AAAAACTACT 
ACCCCGAATT 
TGACX3CTX3AT 
ATATCAACCC 
AATTCTGTAT 
CAGGGAATCA 



GTAGTTGCAG 
GGCGCAAGGG 
GGACATGATG 
TATX3CGGTAG 
TATACCAGCA 
CGTAAATTAT 
ACATTTAACC 
ATTAATCAAA 
GGCGTAAGTQ 
GATTTAACTC 
GGCGAAACAT 
TTTGCTCAAG 
AGTAGCATAG 
GGTGCAAGTO 
CGCTTTCAAA 
AATGCTAAAA 
ACCTCTCCTA 
-AATAGTGACA 
AGATTAACAT 
GTTTATAACT 
TTATATATCA 
ACCAAGCAAA 
CAAACCAAGC 
CATACCATGQ 
ACAAAATACG 
TAAACAACCA 
TGAGCTTGCT 
CGTTCACX3AT 
AATTCATTTT 
GGCAATTTCC 
TTTTGCCTCT 
AGATTCCGAA 
TTTTTACTTA 
ACAACTTTGT 
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GTTTTTCGCC 
AGTTTAACTA 
ATGTATTAAA 
GCATAGGATA 
TGAGTTATGC 
CAAAAGGTCA 
TTGGAATGGA 
CATCCGAGTT 
CAGGCATTGA 
ATCATTATTA 
TTAATCGCAG 
GTTGGCATTT 
ATTTATTCTC 
GTQAGCGCGG 
TCAGCCCTTA 
CTTACGGCGA 
CACAAAACTT 
ATTTGAATGQ 
TCAGTTTCTA 
ATATGCTTTA 
AACAAACTAA 
CCAAGCAAAC 
AAACCAAGCA 
CAATACAAGG 
ACCQCTTCAC 
CCCAAACCCA 
TGCCGCGAAT 
A7TGAATTTG 
GCCACTCGTC 
GAAGAAGGGG 
TCCCCCTACG 
GGTGGCTTTC 
CCOGAATCCA 
GCTTCATTGT 



TTTTGGCAAA 
TCAACGTGTA 
TCTAAACGCA 
TACTTATCCG 
TGATTCTAAT 
ATCTATCTCT 
AGACCAGTTT 
AAACACCCTG 
TGGACATATC 
CGCGAGTAAA 
CTATCACATT 
TAGCAGTCAA 
TGTAACAGGT 
TCTTGTATGG 
TGCGTTTTAT 
AGATATGCAC 
AAGCTTAGAT 
CAACAAAAAA 
ACCCTGAAAT 
CCCGCCAATT 
GCAAACCAAG 
CAAGCAAACC 
ATGCTAAAAA 
GATTTAATAA 
TTGTAGAATC 
ACCTATTACG 
TAATGGCGAT 
ACGCACCXX3C 
TCGCTAATGC 
CATTAAAGAT 
TTAACGCAGA 
ATTTAGCAAC 
ATGTCAATAT 
GTTTTGCGTT 



ACGCGTAGCT 
AGTCTAGGTT 
TTGACCAATG 
TTTTATGATA 
GATATCGACG 
GCGAATCTOA 
AAAATTAATT 
GGTGCAACGA 
CAATTTACCC 
TTACCAGGCT 
AGCACAGCCA 
TTATCGGGTC 
ACTTATGGCG 
CGTAATGAAT 
GATGCAGGTC 
ACGGTATCCT 
GCTTTTGTTG 
CGCACAAGCT 
TTAATCAACT 
TACAGTCTAT 
CAAACCAAGC 
AAGCAAACCA 
ACAATTTATA 
TATGACAAAA 
AAACAACGAC 
CCTGGAACAA 
TTTGGAAAAA 
TCAGCTGGCA 
AATTACAACA 
GATTAGCCTG 
CCATATTCTC 
AGACAAC T CT 
GAGTTTAGAT 
GCAGTCTTCA 



S640 

S700 

5760 

5620 

5880 

5940 

6000 

6060 

6X20 

6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 

6780 

6840 

6900 

6960 

7020 

7080 

7140 

7200 

7260 

7320 

7380 

7440 

7500 

7560 

7620 
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CGTTTTATTG GTACTGCATC TGCGTTTCAT AAAAGAGCGG 
AAAAAACTCG CCGAAATTGC TAATTTAGAT GAATTGCCTG 
TATATGCACT GCAGTTATGA TTTAGCAAAA AACAAGCACG 
GAACTTGTCC GCAAGCATAT CCTCACGCAA GGATGGCAAG 
GGTAAAAAGG ACX^GCAAACC TGTGATGATG GTACTGCTTG 
TCGATTTATC GCACGCATTC AACTTCAATG ATTGCTGCTC 
GGCTTAGGCC ATGAGGGCGT TGATAACATA GGTCGAGAAG 
ATCAGTAGCA ATAATATAAT GGAGAGACTG TTTTTTATCX: 
CAACCCGCAG TGTTCTATAT GCCAAGCATT GGCATGGATA 
AACACTCGGC TTGCCCCTAT TCAAGCTGTA GCCTTGGGTC 
GAATTTATTG ATTATGTCAT CGTAGAAQAT GATTATGTGG 
GAAACCCTTT TACGCTTACC CAAAGATGCC CTACCTTATG 
CAAAAAGTGG ATTATGTACT CAGGGAAAAC CCTGAAGTAG 
ACCACAATGA AATTAAACCC TGAATTmG CTAACATTGC 
AAAGTCAAAA TACATTTTCA TTTCGCACTT GGACAATCAA 
GTCAAATGGT TTATCGAAAG CTATTTAGGT GACGATGCCA 
TATCACGATT ATCTGGCAAT ATTGCOTGAT TQCGATATGC 
GGTAATACTA" ACXMCATAAT TOATATGGTT ACATTAGGTT 
GGGGATOAAG TACATGAACA TAITGATOAA GGTCTGTTTA 
TGGCTGATAG CCX3ACACACG AGAAACATAT ATTGAATGTG 
CATCAAGAAC GCCTTGAACT CCaTCGTTAC ATCATAGAAA 
TTTACAGGCG ACCCTCQTCC ATTGGGCAAA ATACTCCTTA 
CGGAAGCACT TGAGTAAAAA ATAACGGTTT TTTAAAGTAA 

GCarrXTAAA aacctctcaa aaatcaaccg cacttttatc 

TGACAGTTTA TCTCTTTCTT AAAATACCCA TAAAATTGTG 
TTCAATTGTT GATAOGGCAA ACTAAAGACQ GCGCGTrCTT 
(2) INFORMATION FOR SEQ ID NO:6: 

(1) SEQUENCE CHARACTBRISTtCS : 

(A) LENGTH: 9323 iDase pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOIiOGY: linear 

(ii> MOLECULE TYPE: DHA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

CGCCACrrCA ArrrrGGATT gttgaaattc aactaaccaa aaagtgcggt taaaatctgt 



TGGTTTTACA 
CAftATATCCT 

atgttaagcg 
accgctacct 
aacattttaa 
gagaaaaatt 
tgtttgacga 
gtaaacagtg 
ttagcacgat 

ATCCrOCCAC 

gcagtgaaga 
taccatctgc 
tcaatatcgg 
aagaaatcag 
cagqcttgac 
ctgcacatcc 
tactaaatcc 

TAGTTGGTGT 

aacgcttagg 

CTTTGCGTCT 
ACAACGGCTT 
AGAAAACAAA 
AAGTGCGGTT 
TTTATAACGC 
GCAATAGTTG 
CGGCAGTCAT 



GTGGTTTCCT 
TCATGATGTA 
TCCATTAAAC 
TTACACCTTA 
TTCGGGACAT 
CTATTTAGTC 
GTTCTTTGAA 
CGAAACTTTC 
TTTTGTGAGC 
TAOGCATTCT 
TTGTTTTAGC 
ACTCGCCCCA 
TATTGCCQCT 
AGATAAAGCT 
ACACCCTTAT 
CCACGCACCT 
GTTTCCTTTC 
ATGCAAAACG 
ACTACCAGAA 
AGCAGAAAAC 
ACAAAAGCTT 
TOAATGGAAG 
AATTTTCAAA 
TCCCGCGCQC 
GGTAATCAAA 
C 



7680 

7740 

7800 

7860 

7920 

7980 

6040 

8100 

8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 

8820 

8880 

8940 

9000 

9060 

9120 

9171 



60 
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GGAGAAAATA GGTTGTAGTG AAGAACGAGG 
TTGGGCATTG GTTGGCGTTT CTTTTTCGGT 
CAATCCACCA ACAACTTTAC CGTTGGTTTT 
GCGAATACGT AATCCCATTT TTTGTTTAGC 
GTTGCCCAAA AATAAATTTT GATGTTCTAA 
TTCAATACCT ATTTGTGGCG AAATCGCCAA 
TCCCACTCAA ATCAACTGGT TAAATATACA 
ATGACAAACA ACAATTACAA CACCTTTTTT 
AGTATAAATC CGCCATATAA AATGGTATAA 
TTTCATCTTT CATCTTTCAT CTTTCATCTT 
ATCTTTCATC TTTCATCTTT CACATGAAAT 
GAATGAAGAG GGAGCTGAAC GAACGCAAAT 
TAGGAGAAAA TATGAACAAG ATATATCGTC 
TTGCTGTGTC TGAATTGGCA CGGGGTTCTG 
CTGCTCGCAT GAAAGTQCGT CACTTAGCGT 
TAGGTGTAAC ATCTATTCCA CAATCTOTTT 
TtSAAATGGTG CAGTTTITAC 
— ACGCTATCAT_i.TAATTGGAAA -CAATTTAACA" 
AAGAAAACAA CAACTCCGCC GTATTCAACC 
AAGGGATTTT AGATTCTAAC GGACAAGTCT 
GTAAAGACGC AATTATTAAC ACTAATGGCT 
AAAACATCAA GGCXMTGTAAT TTCACCTTCG 
TTGTGAATCA CGGTTTAATT ACTGTCGGTA 
AAGTGAAAAA CGAGGGTQTa ATTAGCGTAA 
AAAAAATCAC CATCAGCOAT ATAATAAACC 
AAAATGAAGC GGTCAATCTG GGCGATATTT 
CTGCCACTAT TCGAAACCAA GQTAAACTTT 
GCAATATTGT TCTTTCCGCC AAAGAGGGTG 
AAAATCAGCA AGCTAAAGGC GGCAAGCTGA 
CAGGTGCAGT TATCGACXTTT TCAGGTAAAG 
AGCGCGGCGA AGGTAAAAAC GGCATTCAAT 
CAACCATCAA TGTATCAGGC AAAGAAAAAG 
CGrrAATTGA CGGCAATATT AACGCTCAAG 
TTGTGGAGAC ATCGGGGCAT TATTTATCCA 



83 

TAATTGTTCA 
TAATAGTAAA 
AAGCGTTAAT 
AAGAAAATGA 
AATCATAAAT 
TTTTAATTCA 
AGATAATAAA 
GCAGTCTATA 
TCTTTCATCT 
TCATCTTTCA 
GATGAACCGA 
GATAAAGTAA 
TCAAATTCAG 
ACCATTCCAC 
TAAAGCCACT 
TAGCAAGCGG 
JSTAATAAAAC 
TCGACCAAAA 
GTGTTACATC 
TTTTAATCAA 
TTACGGCTTC 
AGCAAACCAA 
AAGACGGCAG 
ATGGTUGCAG 
CAACCATTAC 
TTGCCAAAGG 
CTQCTGATTC 
AAGCGGAAAT 
TGATAAAGTC 
AAGGGGGAOA 
TAGCAAAGAA 
GCGGACGOGC 
GTAGTGGTGA 
TTGACAGCAA 



AAAGGATAAA 
TTATATTCTG 
GTAAGTTCTT 
TCGGGATAAT 
TTTGCAAGAT 
ATTTCTTGTA 
AATAAATCAA 
TGCAAATATT 
TTCATCTTTC 
TCTTTCATCT 
GGGAAGGGAG 
TTTAATTGTT 
CAAACGCCTG 
AGAAAAAGGC 
TTCCGCTATG 
CAATTTAACA 
CATTATCCGC 
TGAAATGGTG" 
TAACCAAATC 
CCCAAATGGT 
TACGCTAGAC 
AGATAAAGCG 
TGTAAATCTT 
CATTTCTTTA 
TTACAGCATT 
CGGTAACATT 
TGTAAGCAAA 
TGGCGGTGTA 
CGATAAAGTC 
AACTTACCTT 
AACCTCTTTA 
TATTGTGTGG 
TATCGCTAAA 
TGGAATTGTT 



GCTCTCTTAA 
GACGACTATG 
GCTCTTCTTG 
CATAATAGGT 
ATTGTGGCAA 
GCATAATATT 
GATTTTTGTG 
TTAAAAAAAT 
ATCTTTCATC 
TTCATCTTTC 
GGAGGGGCAA 
CAACTAACCT 
AATGCTTTGG 
AGCGAAAAAC 
TTACTATCTT 
TCGACCAAAA 
AACAGTGTTG 
"CAGTTTTTAC 
TCCCAATTAA 
ATCACAATAG 
ATTTCTAACG 
CTCGCTGAAA 
ATTOGTGaCA 
CTCGCAGGGC 
QCCGCQCCTG 
AATGTCCQTQ 
GATAAAAGCG 
ATTTCCQCTC 
ACATTAAAAA 
GGCGGTGACG 
GAAAAAGGCT 
GGCGATATTG 
ACCGGTGGTT 
AAAACAAAAG 



120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
640 
900 
960 
X020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
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AGTGGTTGCT AGACCCTGAT GATGTAACAA TTGAAGCCGA AGACCCCCTT CGCAATAATA 216 0 

CCGGTATAAA TGATGAATTC CCAACAGGCA CCGGTGAAGC AAGCGACCCT AAAAAAAATA 2220 

GCGAACTCAA AACAACGCTA ACCAATACAA CTATTTCAAA TTATCTGAAA AACGCCTGGA 2280 

CAATGAATAT AACGGCATCA AGAAAACTTA CCGTTAATAG CTCAATCAAC ATCGGAAGCA 2340 

ACTCCCACTT AATTCTCCAT AGTAAAGGTC AGCGTGGCGG AGGCGTTCAG ATTGATGGAG 2400 

ATATTACTTC TAAAGGOGGA AATTTAACCA TTTATTCTGG CTGGATGGGTT GATGTTCATA 2460 

AAAATATTAC GCTTGATCAG GGITlViTAA ATATTACCGC CGCTTCCGTA GCTTTTGAAG 2S20 

GTGGAAATAA CAAAGCACGC GACGCGGCAA ATGCTAAAAT TGTCGCCCAG GGCAGTGTAA 2580 

CCATTACAGG AGAGGGAAAA GATTTCAGGG CTAACAACGT ATCTTTAAAC GGAACGGGTA 264 0 

AAGGTCTGAA TATCATTTCA TCAGTGAATA ATTTAACCCA CAATCTTAGT GGCACAATTA 2700 

ACATATCTGG GAATATAACA ATTAACCAAA CTACGAGAAA GAACACCTCG TATTGGCAAA 2760 

CCAGCCATGA TTCGCACTGG AACGTCAGTG CTCTTAATCT AGAGACAGGC GCAAATTTTA 2820 

CCTTTATTAA ATACATTTCA AGCAATAGCA AAGGCTTAAC AACACAGTAT AGAAGCTCTG 2880 

CAGGOQTGAA TTTTAACGGC GTAAATGGCA ACATGTCATT CAATCTCAAA GAAGGAGCGA 2940 

AAGTTAATTT CAAATTAAAA CCAAACGAGA ACATGAACAC AAGCAAACCT TTACCAATTC 3000 

GGTTTTTAGC CAATATCACA GCCACTGGTG GGGGCTCTGT TOrTTTTGAT ATATATGCCA 3060 

ACCATTCTGG CAGAGGGGCT GAGTTAAAAA TGAGTGAAAT TAATATCTCT AACGGCGCTA 3120 

Ai ^i-iACCtt AAATTCCCAT GTTCGCGGCG ATGACGCTTr~TAAAA^ 3180 

CCATAAATGC AACCAATTCA AATTTCAGCC TCAGACAGAC GAAAGATGAT TTTTATGACG 3240 

GGTACGCACG CAATGCCATC AATTCAACCT ACAACATATC CATTCTGGGC GGTAATGTCA 3300 

CCCTTGGTGG ACAAAACTCA AGCAQCAQCA TTACGGGGAA TATTACTATC GAGAAAGCAG 3 360 

CAAATGTTAC GCTAGAAGCC AATAACOCCC CTAATCAGCA AAACATAAGG GATAGAQTTA 3420 

TAAAACTTGG CAGCTTQCTC CrTTAATOOOA GTTTAAGTTT AACTOGCXSAA AATGCAGATA 3480 

TTAAAGQCAA TCTCACTATT TCAGAAAGOG CCACTTTTAA AGOAAAGACT AGAGATACCC 3540 

TAAATATCAC CGGCAATTTT ACCAATAATG GCACTGCCGA AATTAATATA ACACAAGGAG 3600 

TGGTAAAACT TGGCAATGTT ACCAATGATG GTGATTTAAA CATTACCACT CACGCTAAAC 3660 

GCAACCAAAG AAOCATCATC GGCGGAGATA TAATCAACAA AAAAGGAAGC TTAAATATTA 3 720 

CAGACAGTAA TAATGATGCT GAAATCCAAA TTGGCGGCAA TATCTCGCAA AAAGAAGGCA 3780 

ACCTCACGAT TTCTTCCGAT AAAATTAATA TCACCAAACA QATAACAATC AAAAAGGGTA 3840 

TTGATGGAGA GGACTTCTAGT TCAGATGOGA CAAGTAATGC CAACCTAACT ATTAAAACCA 3900 

AAGAATTX3AA ATTGACAGAA GACCTAAGTA TTTCAGGTTT CAATAAAGCA GAGATTACAG 3950 

CCAAAGATGG TAGAGATTTA ACTATTGGCA ACAGTAATGA COGTAACAGC GGTGCCGAAG 4020 

CCAAAACAGT AACTTTTAAC AATGTTAAAG ATTCAAAAAT CTCTGCTGAC GGTCACAATG 4080 

TGACACTAAA TAGCAAAGTG AAAACATCTA GCAGCAATGG CGGACGTGAA AGCAATAGCG 414 0 
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ACAACGATAC CGGCTTAACT ATTACTGCAA AAAATGTAGA AGTAAACAAA GATATTACTT. 
CTCTCAAAAC AGTAAATATC ACCGCGTCGG AAAAGGTTAC CACCACAGCA GGCTCGACCA 
TTAACGCAAC AAATGGCAAA GCAAGTATTA CAACCAAAAC AGGTGATATC AGCGGTACGA 
TTTCCGGTAA CACGGTAAGT GTTAGCGCGA CTGGTGArrT AACCACTAAA TCCGGCTCAA 
AAATTGAAGC GAAATCGGGT GAGGCTAATG TAACAAGTGC AACAGGTACA ATTCGCGGTA 
CAATTTCCGG TAATACGGTA AATGTTACGG CAAACGCTX3G CGATTTAACA GTTGGGAATG 
GCGCAGAAAT TAATGCGACA GAAGGAGCTG CAACCTTAAC CGCAACAGGG AATACCTTCA 
CTACTGAAGC CGGTTCTAGC ATCACTTCAA CTAAGGGTCA GGTAGACCTC TTGGCTCAaA 
ATGGTAGCAT CGCAGGAAGC ACTAATGCTG CTAATGTOAC ATTAAATACT ACAGGCACXTT 
TAACCACCGT GGCAGGCTCG GATATTAAAG CAACCAGCGG CACCTTGaTT ATTAACGCAA 
AAGATGCTAA GCTAAATGGT GATGCATCAG GTGATAGTAC AGAAGTGAAT GCAGTCAACG 
ACTGGGGATT TGGTAGTGTG ACTGCGGCAA CCTCAAGCAG TGTX3AATATC ACTGGGGATT 
TAAACACAGT AAATGGGTTA AATATCATTT CGAAAGATGG TAGAAACACT GTGCGCTTAA 
GAGGCAAGGA AATTGAGGTG AAATATATCC AGCCAGGTGT AGCAAGTGTA GAAGAAGTAA 
TTGAAGCGAA ACGCGTCCTT GAAAAAGTAA AAGATTTATC TGATGAAGAA AGAGAAACAT 
TAGCTAAACT TGGTGTAAGT GCTGTACGTT TTGTTGAGCC AAATAATACA ATTACAGTCA 
ATACACAAAA TGAATTTACA ACCAOACOGT CAAGTCAAGT GATAATTTCT -GAAGGTAAGG 
CGTGTTTCTC AAGTGGTAAT GGCaCAC?GAG TATGTACCAA TGITCCTOAC OATOGACAGC 
CGTAGTCAGT AATTGACAAG GTAGATITCA TCCTGCAATG AAGTCATTTT ATTTTCGTAT 
TATTTACTGT GTCGGTTAAA GTTCAGTAOG GGCTTTACCC ATCTTGTAAA AAATTACGGA 
GAATACAATA AAGTATTTTT AACAGGTTAT TATTATGAAA AATATAAAAA GCAGATTAAA 
ACTCAGTGCA ATATCAGTAT TGCTTGGCCT GGCTTCTTCA TCATTGTATC CAGAAGAAGC 
GTTTTTAGTA AAAGGCTTTC AGTTATCTGG TGCACTTOAA ACTTTAAGTG AAGACGCCCA 
ACTCTCTGTA GCAAAATCTT TATCTAAATA CCAAGGCTCG CAAACTTTAA CAAACCTAAA 
AACAGCACAG CTTGAATTAC AGGCTGTGCT AGATAAOATT GAGCCAAATA AATTTGATXST 
GATATTGCCG CAACAAACCA TTACOGATGG CAATATCATG -rrraAGCTAG TCTCGAAATC 
AGCCGCAGAA AGCCAAGTTT TTTATAAGGC GAQCCAGGQT TATAQTOAAO AAAATATCGC 
TCGTAGCCTG CCATCTTTGA AACAAGOAAA AGTGTATCAA GATX3GTCGTC AOTQGTTOQA 
TTTGCGTGAA TTTAATATGG CAAAAGAAAA CCCGCTTAAG GTTACCCGTG TACATTACGA 
ACTAAACCCT AAAAACAAAA CCTCTAATTT GATAArTOOQ GGCTTCTOGC CTTTTGGTAA 
AACGCGTAGC TTTATTTCTT ATGATAAITT CGGCGCGAGA GAGTITAACT ACCAACQTOT 
AAGCTTGGGT TTTOTTAATG CCAATTTAAC TGGTCATCAT GATGTGTTAA TTATACCAGT 
ATGAGTTATX5 CTGATTCTAA TGATATCGAC GGCTTACCAA GTCCGATTAA TCGTAAATTA 
TCAAAAGGTC AATCTATCTC TCCGAATCTG AAATGGAGTT ArrTATCTCCC AACATTTAAC 
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CTTGGCATGG 


AAG AC CAATT 


TAAAATTAAT 


TTAGGCTACA 


ACTACCGCCA 


TATTAATCAA 


6240 


ACCTCCGCGT 


TAAATCGCTT 


GGGTGAAACG 


AAGAAAAAAT 


TTGCAGTATC 


AGGCGTAAGT 


6300 


GCAGGCATTG 


ATGGACATAT 


CCAATTTACC 


CCTAAAACAA 


TCrriAATAT 


TGATTTAACT 


6360 


CATCATTATT 


& ^^^^ ^^^^ m « « 

ACGCGAGTAA 


ATTACCAGGC 


TCTTTTGGAA 


TGGAGCGCAT 


TX3GCGAAACA 


6420 


TTTAATCGCA 


GCTATCACAT 


TAGCACAGCC 


AGTTTAGGGT 


TGAGTCAAGA 


GTTTGCTCAA 


6480 


GGTTGGCATT 


TTAGCAGTCA 


ATTATCAGGT 


CAATTTACTC 


TACAAGATAT 


TAGCAGT AT A 


6540 


GATTTATTCT 


CTGTAACAGG 


TACTTATGGC 


GTCAGAGGCT 


TTAAATACGG 


CGGTGCAAGT 


6600 


GGTGAGCGCG 


GTCTTGTATG 


GCGTAATGAA 


TTAAGTATGC 


CAAAATACAC 


CCGCTTCCAA 


6660 


ATCAGCCCTT 


ATGCGT"ri"l'A 


TGATGCAGGT 


CAGTTCCGTT 


ATAATAGCGA 


AAATGCTAAA 


6720 


ACTTACGGCG 


AAGATATGCA 


CACGGTATCC 


TCTGCGGGTT 


TAGGCATTAA 


AACCTCTCCT 


6760 


ACACAAAACT 


TAAGCCTAGA 


TGCTTTTGTT 


GCTCGTCGCT 


TTGCAAATGC 


CAATAOTGAC 


6840 


AATTTGAATG 


GCAACAAAAA 


ACGCACAAGC 


TCACCTACAA 


CCTVCTGGGQ 


GAGATTAACA 


6900 


TTCAGTTTCT 


AACCCTGAAA 


TTTAATCAAC 


TGGTAAGCGT 


TCCGCCTACC 


AGTTTATAAC 


6960 


TATATGCTTT 


ACCCGCCAAT 


TTACAGTCTA 


TAGGCAACCC 


TGTTTTTACC 


CTTATATATC 


7020 


AAATAAACAA 


GCTAAGCTGA 


GCTAAGCAAA 


CCAAGCAAAC 


TCAAGCAAGC 


CAAGTAATAC 


7080 


TAAAAAAACA 


ATTTATATGA 


TAAACTAAAG 


TATACTCCAT 


GCCATGGCGA 


TACAAGGGAT 


7140 


TTAATAATAT 


GACAAAAGAA 


AATTTGCAAA 


ACGCTCCTCA 


AGATGCGACC 


GCTTTACTTG 


7200 


CGGAATTAAG 


CAACAATCAA 


ACTCCCCTGC 


GAATATTTAA 


ACAACCACGC 


AAGCCCAGCC 


7260 


TATTACGCTT 


GGAACAACAT 


ATCX3CAAAAA 


AAGATTATGA 


GTTTGCTTGT 


CGTGAATTAA 


7320 


TGGTGATTCT 


GGAAAAAATG 


GACGCTAATT 


TTGGAGGCGT 


TCACGATATT 


GAATTTGACG 


7380 


CACCCGCTCA 


GCTGGCATAT 


CTACCCX3AAA 


AATTACTAAT 


TTATTTTGCC 


ACTCGTCTCG 


7440 


CTAATGCAAT 


TACAACACTC 


TTTTCCGACC 


CCGAATTGGC 


AATTTCTGAA 


GAAGGGGCGT 


7500 


TAAAGATGAT 


TAOCCTGGAA 


CGCTQQTTGA 


CGCTGATTTT 


TGCCTCTTCC 


CCCTACXSTTA 


7560 


ACGCAGACCA 


TATTCTCAAT 


AAATATAATA 


TCAACCCAGA 


TTCCGAAGGT 


GGCTTTCATT 


7620 


TAGCAACAGA 


CAAd^-i-iTtri- 


ATTGCTAAAT 


TCTGTATTTT 


TTACTTACCC 


GAATCCAATG 


7680 




#WMM4 ft ft ffW^V^ft>^ 

1 1 lAGATGCG 


TTATGGGCAG 


GGAATCAACA 


ACTTTGTGCT 


TCATIVIXi'lT 


7740 


TTGCGTTGCA 


GTCTTCACGT 


TTTATTGGTA 


CCGCATCTGC 


GITIXJATAAA 


AGAGCGGTGG 


7800 


TTTTACAGTG 


GTTTCCTAAA 


AAACTCGCCX3 


AAATTGCTAA 


TTTAGATGAA 


TTGCCTGCAA 


7860 


ATATCCTTCA 


TGATGTATAT 


ATGCACTGCA 


GTTATGATTT 


AQCAAAAAAC 


AAGCACGATG 


7920 


TTAAGCGTCC 


ATTAAACX3AA 


CTlXjrVCCGCA 


AGCATATCCT 


CACGCAAGGA 


TGGCAAOACC 


7980 


GCTACCTTTA 


CACCTTAGGT 


AAAAAGGACG 


GCAAACCTGT 


GATGATGGTA 


CTGCTTGAAC 


8040 


ATTTTAATTC 


GGGACATTCG 


ATTTATOGTA 


CACATTCAAC 


TTCAATGATT 


GCTGCTCGAG 


6100 


AAAAATTCTA 


TTTAGTGGGC 


TTAGGCCATG 


AGGGCGTTGA 


TAAAATAGGT 


CGAGAAGTGT 


8160 


TTGACGAGTT 


CTTTGAAATC 


AGTAGCAATA 


ATATAATGGA 


GAGACTGTTT 


TTTATCCGTA 


8220 
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AACAGTGCGA 


AACTTTCCAA CCCGCAGTGT 


TCTATATGCC 


MGCATTGGC 


ATGGATATTA 


w « o w 


CCACGATTTT 


TGTGAGCAAC ACTCGGCTTG 


CCCCTATTCA 


AGCTGTAGCC 


CTGGGTCATC 


O W *t W 


CTGCCACTAC 


GCATTCTGAA TTTATTGATT 


ATGTCATCGT 


AGAAGATGAT 


TATGTGGGCA 


8400 


GTGAAGATTG 


TTTCAGCGAA ACCCTTTTAC 


GCTTACCCAA 


AGATGCCCTA 


CCTTATQTAC 


8460 


CTTCTGCACT 


CGCCCCACAA AAAGTGGATT 


ATGTACTCAG 


GGAAAACCCT 


GAAGTAGTCA 


8520 


ATATCGGTAT 


TGCCGCTACC ACAATGAAAT 


TAAACCCTGA 


ATTTTTGCTA 


ACATTGCAAG 


O 9 o w 


AAATCAGAGA 


TAAAGCTAAA GTCAAAATAC 


ATTTTCATTT 


CGCACTTGGA 


CAATCAACAG 


O Q m U 


GCTTGACACA CCCTTATGTC AAATGGTTTA 


TCGAAAGCTA 


TTTAGGTGAC 


GATGCCACTG 


O f SJ KJ 


CACATCCCCA 


CXKIACCTTAT CACGATTATC 


TGGCAATATT 


QCGTGATTGC 


GATATGCTAC 


O / O w 


TAAATCCGTT 


TCCTTTCGGT AATACTAACX3 


GCATAATTGA 


TATCGTTACA 


TTAGG'ITITAG 




TTGGTGTATG 


CAAAACGGGG GATGAAGTAC 


ATGAACATAT 


TGATGAAGGT 




8680 


GCTTAGGACT 


ACCAGAATGG CTGATAGCCG 


ACACACGAGA 


AACATATATT 


GT^VTGTG^T^*!' 




TGCGTCTAGC 


AGAAAACCAT CAAGAACGCC 


TTGAACTCCG 


TCGTTACATC 


ATAGAAAAt^ 




ACGGCTTACA 


AAAGcrrriT acaggcgacc 


CTCGTCCATT 


GGGCAAAATA 




9060 


AAACAAATGA ATGGAAGCGG AAGCACTTGA 


GTAAAAAATA 


ACGGTTTTTT 


AAAGTAAAAG 


9120 


TGCGGTTAAT 


TTTCAAAGCG TTTTAAAAAC 


CTCTCAAAAA 


TCAACCGCAC 


TTTTATCTTT 


9180 


ATAACGATCC 


CGCACGCTGA CAGTTTATCA GCCTCCCGCC 


ATAAAACrCC 


GCCTTTCATX3 


9240 


GCCkSAGATfr 


TAGCCAAAACTGCCJ^^ " 


TAAAGGCTAA 


AATCACCAAA 


TTGCACCACA 


9300 


AAATCACCAA 


TACCCACAAA AAA 








9323 



(21 INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4794 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATGAACAAGA TATATCOTCT CAAATTCAGC AAACGCCTGA ATGCTTTCGT TGeTOTGTCT 60 

GAATTGACAC GOGGTTOTGA CCATTCCACA QAAAAAGGCA GTGAAAAACC TOTTCGTACG 120 

AAAGTACGCC ACTTGGCGTT AAAGCCACTT TCCGCTATAT TGCTATCTTT GGGCATGGCA 180 

TCCATTCCGC AATCTGTTTT AGCGAGCGGT TTACAGGGAA TGAGCGTCGT ACACGGTACA 240 

GCAACCATGC AAGTAGACGG CAATAAAACC ACTATCCGTA ATAGOOTCAA TGCTATCATC 300 

AATTGGAAAC AATTTAACAT TGACCAAAAT GAAATGGTGC AGTTTTTACA AGAAAGCAGC 360 

AACTCTGCCG TTTTCAACCG TGTTACATCT GACCAAATCT CCCAATTAAA AGGGATTTTA 420 
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GATTCTAACG GACAAGTCTT TTTAATCAAC CCAAATGGTA TCACAATAGG TAAAGACGCA 48 0 

ATTATTAACA CTAATGGCTT TACTGCTTCT ACGCTAGACA TTTCTAACGA AAACATCAAG 54 0 

GCGCGTAATT TCACCCTTGA GCAAACCAAG GATAAAGCAC TCGCTGAAAT CGTGAATCAC 600 

GGTTTAATTA CCGTTGGTAA AGACGGTAGC GTAAACCTTA TTGGTGGCAA AGTGAAAAAC 660 

GAGGGCGTGA TTAGCGTAAA TGGCGGTAGT ATTTCTTTAC TTGCAGGGCA AAAAATCACC 720 

ATCAGCGATA TAATAAATCC AACCATCACT TACAGCATTG CTGCACCTGA AAACGAAGCG 780 

ATCAATCTGG GCGATATTTT TGCCAAAGGT GGTAACATTA ATGTCCGCGC TGCCACTATT 840 

CGCAATAAAG GTAAACTTTC TGCCGACTCT GTAAGCAAAG ATAAAAGTGG TAAC A TT G TT 900 

CTCTCTGCCA AAGAAGGTGA AGCGGAAATT GGCGGTOTAA TTTCCGCTCA AAATCAGCAA 960 

GCCAAAGGTG GTAAGTTGAT GATTACAI3GC GATAAAGTTA CATTGAAAAC GGGTGCAGTT 1020 

ATCGACCTTT CGGGTAAAGA AGGGGGAGAA ACTTATCTTG GCGGTGACGA GCX3TGGCGAA 108 0 

GGTAAAAACG GCATTCAATT AGCAAAQAAA ACCACTTTAG AAAAAGGCTC AACAATTAAT 1140 

GTGTCAGGTA AAGAAAAAGG TGGGCGCGCT ATTGTATGGG GCGATATTGC GTTAATTGAC 12 00 

GGCAATATTA ATGCCCAAGG TAAAGATATC GCTAAAACTG GTGGTTTTGT GGAGACGTCG 1260 

(3GGCATTACT TATCCATTGA TOATAACGCA ATTGTTAAAA CAAAAGAATG GCTACTAGAC 1320 

CCAGAGAATG TGACTATTGA AGCTCCTTCC GCTTCTCGCG TCGAGCTGGG TGCCGATAGG 1380 

AATTCCCACT CGGCAGAGGT GATAAAAGTG ACCCTAAAAA AAAATAACAC . CTCCTTCACA 1440 

ACACTAACCA ATACAACCAT TTCAAATCTT CTGAAAAGTG CCCACQTOGT GAACATAACG ISOO 

GCAAGOAGAA AACTTACCOT TAATAGCTCT ATCAGTATAQ AAAGAGGCTC CCACTTAATT 1560 

CTCCACAGTG AAGGTCAGGG CGGTCAAGGT GTTCAGATTG ATAAAGATAT TACTTCTGAA 1620 

GGCGGAAATT TAACCATTTA TTCTGGCGGA TGGGTTGATG TTCATAAAAA TATTACGCTT 1680 

GGTAGCGGCT TTTTAAACAT CACAACTAAA GAAGGAGATA TCOCCTTCGA AGACAAGTCT 1740 

GGACGGAACA ACCTAACCAT TACAGCCXAA OGOACCATCA CCTCAGGTAA TAGTAACGGC 1800 

TTTAGATTTA ACAACGTCTC TCTAAACAGC CTTOOCGGAA AGCTGAGCTT TACTGACAGC 1860 

AGAGAGGACA GAGGTAGAAG AACTAAGGGT AATATCTCAA ACAAATTTGA CX3GAACGTTA 1920 

AACATTTCCG GAACTGTAGA TATCTCAATG AAAGCACCCA AAGTCAGCTG GTTTTACAGA 1980 

GACAAAGGAC GCACCTACTQ OAACGTAACC ACTTTAAATG TTACereGGG TAGTAAATTT 2040 

AACCTCTCCA TTGACAOCAC AGGAAGTGGC TCAACAGGTC CAAGCATACG CAATCCAGAA 2100 

TTAAATGGCA TAACATTTAA TAAAGCCACT TTTAATATCG CACAAGQCTC AACAGCTAAC 2160 

TTTAGCATCA AGGCATCAAT AATGCCCTTT AAGAGTAACG CTAACTACX3C ATTATTTAAT 2220 

GAAGATATTT CAGTCTCAGG GGGGGGTAGC CTTAATTTCA AACTTAACGC CTCATCTAGC 2280 

AACATACAAA CCCCTGGOGT AATTATAAAA TCTCAAAACT TTAATGTCTC AGGAGGGTCA 2340 

ACTTTAAATC TCAAGGCTGA AGGTTCAACA GAAACCGCTT TTTCAATAGA AAATGATTTA 24 00 

AACTTAAACG CCACCGGTGG CAATATAACA ATCAGACAAG TOGAGGGTAC CGATTCACGC 2460 
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GTCAACAAAG GTGTCGCAGC CAAAAAAAAC ATAACTTTTA AAGGGGGTAA TATCACCTTC 2S2 0 

GGCTCTCAAA AAGCCACAAC AGAAATCAAA GGCAATGTTA CCATCAATAA AAACACTAAC 2 580 

GCTACTCTTT GTGGTGCGAA TTTTGCCGAA AACAAATCGC CTTTAAATAT AGCAGGAAAT 2640 

GTTATTAATA ATCGCAACCT TACCACTGCC GGCTCCATTA TCAATATAGC CGGAAATCTT 2700 

ACTGTTTCAA AAGGCGCTAA CCTTCAAGCT ATAACAAATT ACACTTTTAA TGTAGCCGGC 2760 

TCATTTGACA ACAATGGCGC TTCAAACATT TCCATTGCCA GAGGAGGGGC TAAATTTAAA 2820 

GATATCAATA ACACCAGTAG CTTAAATATT ACCACCAACT CTGATACCAC TTACCGCACC 2880 

ATTATAAAAG GCAATATATC CAACAAATCA GGTGATTTGA ATATTATTGA. TAAAAAAAGC 2 940 

GACGCTGAAA TCCAAATTGG CGGCAATATC TCACAAAAAG AAGGCAATCT CACAATTTCT 3000 

TCTGATAAAG TAAATATTAC CAATCAGATA ACAATCAAAG CAGGCGTTCA AGGGGGGCGT 3060 

TCTGATTCAA GTGAGGCAGA AAATGCTAAC CTAACTATTC AAACCAAAGA GTTAAAATTG 3120 

GCAGGAGACC TAAATATTTC AGGCTTTAAT AAAGCAGAAA TTACAGCTAA AAATGGCAGT 3180 

GATTTAACTA TTGGCAATGC TAGCGGTGGT AATGCTGATQ CTAAAAAAGT GACTTTTGAC 3240 

AAGGTTAAAG ATTCAAAAAT CTCGACTGAC GGTCACAATG TAACACTAAA TAGCGAAGTG 3300 

AAAACGTCTA ATGGTAOTAG CAATGCTGGT AATGATAACA GCACCGGTTT AACCATTTCC 3360 

GCAAAAGATG TAACGOTAAA CAATAACGTT ACCTCCCACA AGACAATAAA TATCTCTGCC 3420 

GCAGCAGGAA ATGTAACAAC CAAAGAAGGC.JVCAACTATCA .ATQCAACCAC JVGGCAGCGTG 3480 

TGAAGTAACTG-CTCAAAATGG-TACAATTAAA-GGCAACATTA-CCTCGCAAAA TGTAACAGTC 3S40 

ACAGCAACAG AAAATCTTGT TACCACAOAG AATGCTQTCA TTAATGCAAC CASCGGCACA 3600 

GTAAACATTA GTACAAAAAC AGGGGATATT AAAGGTGGAA TTGAATCAAC TTCCGGTAAT 3660 

GTAAATATTA CAGCGAGCGG CAATACACTT AAGGTAAGTA ATATCACTGG TC31AGATGTA 3720 

ACAOTAACAG CGGATGCAGG AGCCTTGACA ACTACAGCAG GCTCAACCAT TAGTOCGACA 3780 

ACAGGCAATO CAAATATTAC AACCAAAACA GGTGATATCA ACGGTAAAGT TQAATCCAGC 3840 

TCCGGCTCXG TAACACTTGT TGCAACTGOA GCAACTCTTG CTGTAGGTAA TATTTCAGGT 3900 

AACACTGTTA CTATTACTGC eGATAGCGGT AAATTAACCT CCACASTAGG TTCTACAATT 3960 

AATGGQACTA ATA6TGTAAC CACCTCAAGC CAATCAGGCG ATATTGAAfiG TACAATTTCT 4020 

GGTAATACAG TAAATGTTAC AGCAAGCACT GGTGATTTAA CTATTGOAAA TAOTGCAAAA 4080 

GTTGAAflCGA AAAATOOAGC TOCAACCTTA ACTGCTGAAT CAGGCAAATT AACCACCCAA 4140 

ACAGGCTCTA GCATTACCTX: AAGCAATGGT CAGACAACTC TTACAGCCAA GGATAGCAGT 4200 

ATCGCAGGAA ACATTAATGC TCCTAATGTG ACGTTAAATA CCACAGGCAC TTTAACTACT 4260 

ACAGGGGATT CAAAGATTAA CXKAACCA6T GGTACCTTAA CAATCAATGC AAAAGATGCC 4320 

AAATTAGATG GTGCTGCATC AGGTGACCGC ACAGTACTAA ATGCAACTAA CGCAAGTGGC 4380 

TCTGGTAACG TGACTGCGAA AACCTCAAGC AGCGTGAATA TCACCGGGGA TTTAAACACA 4440 

ATAAATGGGT TAAATATCAT TTCX3GAAAAT GGTAGAAACA CTGTGCGCTT AAGAGGCAAG 4500 
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GAAATTGATG 


TGAAATATAT 


CCAACCAGGT 


GTAGCAAGCG 


TAGAAGAGGT 


AATTGAAGCG 


4S60 


AAACGCGTCC 


TTGAGAAGGT 


AAAAGATTTA 


TCTGATGAAG 


AAAGAGAAAC 


ACTAGCCAAA 


4620 


CTTGGTGTAA 


GTGCTGTACG 


TTTCGTTGAG 


CCAAATAATG 


CCATTACGGT 


TAATACACAA 


4660 


AACGAGTTTA 


CAACCAAACC 


ATCAAGTCAA 


GTGACAATTT 


CTGAAGGTAA 


GGCGTGTTTC 


4740 


TCAAGTGGTA 


ATGGCGCACG 


AGTATGTACC 


AATGTTGCTG 


ACGATGGACA 


GCAG 


4794 



(2) INFORMATION FOR SEQ ID NO; 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4603 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 



ATGAACAAGA 


TATATCGTCT 


CAAATTCAGC 


AAACGCCTGA 


ATGCTTTGGT 


TGCTGTGTCT 


60 


GAATTGACAC 


GGGGTTGTGA 


CCATTCCACA 


GAAAAAGGCA 


GTGAAAAACC 


TGTTCGTACG 


120 


AAAGTACGCC 


ACTTGGCGTT 


AAAGCCACTT 


TCCGCTATAT 


TGCTATCTTT 


GGGCATGGCA 


180 


TCCATTCCGC 


AATClX^i-riT 


AGCGAGCGGT 


TTACAGGGAA 


TGAGCGTCGT 


ACACGGTACA 


240 


GCAACCATGC 


AAGTAGACGG 


CAATAAAACC 


ACTATCCGTA 


ATAGCGTCAA 


TGCTATCATC 


300 


AATTGGAAAC 


AATTTAACAT 


TGACCAAAAT 


GAAATGGTGC 


AGTTTTTACA 


AGAAAGCAGC 


360 


AACTCTGCCG 


TTTTCAACCG 


TGTTACATCT 


GACCAAATCT 


CCCAATTAAA 


AGGGATTTTA 


420 


GATTCTAACG 


GACAAGTCTT 


TTTAATCAAC 


CCAAATGGTA 


TCACAATAGG 


TAAAGACGCA 


460 


ATTATTAACA 


CTAATGGCTT 


TACTGCTTCT 


ACGCTAGACA 


TTTCTAACGA 


AAACATCAAG 


540 


GCGCGTAATT 


TCACCCTTGA 


GCAAACCAAG 


GATAAAGCAC 


TCGCTGAAAT 


CGTGAATCAC 


600 


GGTTTAATTA 


CCGTTGGTAA 


AGACGGTAGC 


GTAAACCTTA 


TTGGTGGCAA 


AGTGAAAAAC 


660 


GAGGGCGTGA 


TTAGCGTAAA 


TGGCGGTAGT 
* 


ATi-ixrriTAC 


TTGCAGGGCA 


AAAAATCACC 


720 


ATCAGCGATA 


TAATAAATCC 


AACCATCACT 


TACAGCATTG 


CTGCACCTGA AAACGAAGCG 


760 


ATCAATCTGG 


GCGATATTTT 


TGCCAAAGGT 


GGTAACATTA 


ATGTCCGCGC 


TGCCACTATT 


840 


CGCAATAAAG 


GTAAACTTTC 


TGCCGACTCT 


GTAAGCAAAG 


ATAAAAGTGG 


TAACATTGTT 


900 


CTCTCTGCCA 


AAGAAGGTGA 


AGCGGAAATT 


GGCGGTGTAA 


TTTCCGCTCA 


AAATCAGCAA 


960 


GCCAAAGGTG 


GTAAGTTGAT 


GATTACAGGT 


GATAAAGTCA 


CATTAAAAAC 


AGGTGCAGTT 


1020 


ATCGACCTTT 


CAGGTAAAGA 


AGGGGGAGAG 


ACTTATCTTG 


GCGGTGATGA 


GCGTGGCGAA 


1080 


GGTAAAAATG 


GTATTCAATT 


AGCGAAGAAA 


ACCTCrtTAG 


AAAAAGGCTC 


GACAATTAAT 


1140 


GTATCAGGCA 


AAGAAAAAGG 


CGGGCGCGCT 


ATTGTATGGG 


GCGATATTGC 


ATTAATTAAT 


1200 


GGTAACATTA 


ATGCTCAAGG 


TAGCGATATT 


GCTAAAACTG 


GCGGCTTTGT 


GGAAACATCA 


1260 
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GGACATGACT TATCCATTGG TGATGATGTG ATTGTTGACG CTAAAGAGTG GTTAITAGAC 
CCAGATGATG TGTCCATTGA AACTCTTACA TCTGGACGCA ATAATACCGG CGAAAACCAA 
GGATATACAA CAGGAGATGG GACTAAAGAG TCACCTAAAG GTAATAGTAT TTCTAAACCT 
ACATTAACAA ACTCAACTCT TGAGCAAATC CTAAGAAGAG GTTCTTATGT TAATATCACT 
GCTAATAATA GAATTTATGT TAATAGCTCC ATCAACTTAT CTAATGGCAG TTTAAGACTT 
CACACTAAAC GAGATGGAGT TAAAATTAAC GGTGATATTA CCTCAAACGA AAATGGTAAT 
TTAACCATTA AAGCAGGCTC TTGGGTTOAT GTTCATAAAA ACATCACGCT TGGTACGGGT 
TTTTTGAATA TTGTCGCTGG GGATTCTGTA GCTTTTGAGA GAGAGGGCGA TAAAGCACGT 
AACGCAACAG ATGCTCAAAT TACCGCACAA GGGACGATAA CCGTCAATAA AGATGATAAA 
CAATTTAGAT TCAATAATGT ATCTATTAAC GGGACGGGCA AGGGTTTAAA GTTTArTGCA 
AATCAAAATA ATTTCACTCA TAAATTTGAT GGCGAAATTA ACATATCTGG AATAGTAACA 
ATTAACCAAA CCACGAAAAA AGATGTTAAA TACTGGAATG CATCAAAAGA CTCTTACTGG 
AATGTTTCTT CTCTTACTTT GAATACGGTG CAAAAATTTA CCTTTATAAA ATTCGTTCAT 
AGCGGCTCAA ATTCCCAAGA TTTGAGGTCA TCACGTAGAA GTrrPGCAGG CXSTACATTTT 
AACGGCATCG GAGGCAAAAC AAACTTCAAC ATCGGAGCTA ACOCAAAAGC CTTAnTAAA 
TTAAAACCAA ACGCCGCTAC AGACCCAAAA AAAGAATTAC CTATTACTTT TAACGCCAAC 
ACTACAGCTA jrC^^^OVGjn^ ACATACACGG -CAATCTTACC 

TCTAGAGCTG CCGOCATAAA CATGGATTCA ATTAACATTA CCGGCGGGCT TGACTTTrCC 
ATAACATCCC ATAATCGCAA TAGTAATGCT TTTGAAATCA AAAAAGACTT AACTATAAAT 
GCAACTGGCT CGAATTTTAG TCTTAAGCAA ACGAAAGATT CTTTTTATAA TGAATACAGC 
AAACACGCGA TTAACTCAAG TCATAATCTA ACCATTCTTG GCGGCAATGT CACTCTAGGT 
GGGGAAAATT CAAGCAGTAG CATTACGGGC AATATCAATA TCACCAATAA AGCAAATOTT 
ACATTACAAG CTGACACCAG CAACAGCAAC ACAGGCTTGA AGAAAAGAAC TCTAACTCTT 
GGCAATATAT CTGTTGAGGG GAATTTAAOC CTAACTGGTO CAAATGCAAA CATTGTCOOC 
AATCTTTCTA TTGCAOAAGA TTCCACATTT AAAGGAOAAO CCAGTGACAA CCTAAACATC 
ACCGGCACCT TTACCAACAA CGGTACCGCC AACATTAATA TAAAACAAGG AGTGGTAAAA 
CTCCAAGGCG ATAITATCAA TAAAGGTGGT TTAAATATCA CTACTAACGC CrTCAGGCACT 
CAAAAAACCA •rXATTAACGQ AAATATAACT AACGAAAAAG GCGACTTAAA CATCAAGAAT 
ATTAAAGCCG ACGCXGAAAT CCAAATTGGC GGCAATATCT CACAAAAAGA AGGCAATCTC 
ACAATTTCTT CTGATAAAGT AAATATTACC AATCAGATAA CAATCAAAGC AGGCGTTOAA 
G<3C3GGGCGTT CTGATTCAAG TGAGGCAGAA AATGCTAACC TAACTATTCA AACCAAAGAG ■ 
TTAAAATTGG CAGGAGACCT AAATATTTCA GGCTTTAATA AAGCAGAAAT TACAGCTAAA 
AATGGCAGTG ATTTAACTAT TGGCAATGCT AGCGGTGGTA ATGCTGATCC TAAAAAAGTG 
ACTTTTGACA AGGTTAAAGA TTCAAAAATC TCGACTGACG GTCACAATGT AACACTAAAT 



1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1520 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 



BNSDOCID: <WO S736914A1 J_> 



wo 97/36914 



PCT/US97/04707 



AGCGAAGTGA 


AAACGTCTAA 


TGGTAGTAGC 


AATGCTGGTA 


ATGATAACAG 


CACCGGTTTA 


3 360 


ACCATTTCCG 


CAAAAGATGT 


AACGGTAAAC 


AATAfiCGTTA 


CCTCCCACAA 


GACAATAAAT 


34 2 0 


ATCTCTGCCG 


CAGCAGGAAA 


TGTAACAACC 


aaagaagg<:a 


CAACTATCAA 


TGCAACCACA 


34 80 


GGCAGCGTGG 


AAGTAACTGC 


TCAAAATGGT 


ACAATTAAAG 


GCAACATTAC 


CTCGCAAAAT 




GTAACAGTGA 


CAGCAACAGA 


AAATCTrGTT 


ACCACAGAGA 


ATGCTGTCAT 


TAATGCAACC 


w O w w 


AGCGGCACAG 


TAAACATTAG 


TACAAAAACA 


GGGGATATTA 


AAGGTGGAAT 


TGAATCAJVC?T 


J o o u 


TCCGGTAATG 


TAAATATTAC 


AGCGAGCGGC 


AATACACTTA 


AGGTAAGTAA 






CAAGATGTAA 


CAGTAACAGC 


GGATGCAGGA 


GCCTTGACAA 


CTACAGCAGG 


CTCAACCATT 


^ f OKI 


AGTGCGACAA 


CAGGCAATGC 


AAATATTACA 


ACCAAAACAG 


GTGATATCAA 


CGGTAAAGTT 

^^^^^^ ^ m^r%^^t\^ B A 


J o ^ u 


GAATCCAGCT 


CCGGCTCTGT 


AACACTTGTT 


GCAACTGGAG 


CAACTCTTGC 


TGTAGGTAAT 




ATTTCAGGTA 


ACACTGTTAC 


TATTACTGCG 


GATAGCGGTA 


AATTAACCTC 


CACAGTAGGT 


w 7 Q W 


TCTACAATTA 


ATGGGACTAA 


TAGTGTAACC 


ACCTCAAGCC 


AATCAGGCGA 


TATTGAAGGT 


4020 


ACAATTTCTG 


GTAATACAGT 


AAATGTTACA 


GCAAGCACTG 


GTGATTTAAC 


TATTGGAAAT 


4080 


AGTGCAAAAG 


TTGAAGCGAA 


AAATGGAGCT 


GCAACCTTAA 


CTGCTGAATC 


AGGCAAATTA 


4140 


ACCACCCAAA 


CAGGCTCTAG 


CATTACCTCA 


AGCAATGGTC 


AGACAACTCT 


TACAGCCAAG 


4200 


GATAGCAGTA 


TCGCAGGAAA 


CATTAATGCT 


GCTAATGTGA 


CGTTAAATAC 


CACAGGGACT 


4260 


TTAACTACTA 


CAGGGGATTC 


AAAGATTAAC 


GCAACCAGTG 


GTACCTTAAC 


AATCAATGCA 


4320 


AAAGATGCCA 


AATTAGATGG 


TGCTGCATCA 


GGTGACCGCA 


CAGTAGTAAA 


TGCAACTAAC 


43ao 

^ J w V 


GCAAGTGGCT 


CTGGTAACGT 


GACTGCGAAA 


ACCTCAAGCA 


GCGTGAATAT 


CACCGGGGAT 


4440 


TTAAACACAA 


TAAATGGGTT 


AAATATCATT 


TCGGAAAATG 


GTAGAAACAC 


TGTGCGCTTA 


4500 


AGAGGCAAGG 


AAATTGATGT 


GAAATATATC 


CAACCAGGTG 


TAGCAAGCGT 


AGAAGAGGTA 


4560 


ATTGAAGCGA 


AACGCGTCCT 


TGAGAAGGTA 


AAAGATTTAT 


CTGATGAAGA 


AAGAGAAACA 


4620 


CTAGCCAAAC 


TTGGTGTAAG 


TGCTGTACGT 


TTCGTTGAGC 


CAAATAATGC 


CATTACGGTT 


4680 


AATACACAAA 


ACGAGTTTAC 


AACCAAACCA 


TCAAGTCAAG 


TGACAATTTC 


TGAAGGTAAG 


4740 


GCQTOTTTCT 


CAAGTGGTAA 


TGGCGCACGA 


GTATGTACCA 


ATGTTGCTGA 


CGATGGACAG 


4800 


CAG 












4803 



(2) IKPORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CJUOiACTERISTICS : 

(A) LENGTH: 1599 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

Met Asn Lys He Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
1 5 10 IS 

Val Ala Val Ser Glu Leu Thr Arg Gly Cys Asp His Ser Thr Glu Lys 
20 25 30 

Gly Ser Glu Lys Pro Val Arg Thr Lys Val Arg His Leu Ala Leu Lvs 
35 40 45 

Pro Leu Ser Ala He Leu Leu Ser Leu Gly Met Ala Ser He Pro Gin 
50 55 

Ser Val Leu.Ala.Ser Gly Leu Gin Gly Met Ser Val Val His Gly Thr 
^5 "70 75 80 

Ala Thr Met Gin Val Asp Gly Asn Lys Thr Thr He Arg Asn Ser Val 
fiS 90 95 

Asn Ala He He Asn Trp Lys Gin Phe Asn He Asp Gin Asn Glu Met 
2.00 105 110 

Glu Gin Phe Leu Gin Glu Ser Ser Asn Ser Ala Val Phe Asn Arg Val 
2.15 120 125 

Thr Ser Asp Gin He Ser Gin Leu Lys Gly He Leu Asp Ser Asn Gly 
130 135 3^40 

Gin Val Phe Leu He Asn Pro Asn Gly He Thr He Gly Lys Asd Ala 
2-45 150 ^ ^ 155 160 

lie He Asn Thr Asn^GJLj^ Phe_Thr-^.Ala Ser_ Thr Leu Asp He Ser Asn 
" ■ 1€5 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr Leu Glu Gin Thr Lys Asp Lys 
2-80 185 190 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
2-^5 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu Leu Ala Gly Gin Lys He Thr 
22S 230 235 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala He Asn Leu Gly Asp He Phe Ala Lys Gly Glv Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr He Arg Asn Lys Gly Lys Leu Ser Ala 
275 280 285 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lvs 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 

310 315 

Ala Lys Gly Gly Lys I^u Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 
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Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tvr 
340 345 350 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 

360 365 

Lys Lys Thr Thr Leu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lvs 
370 375 3B0 

Glu Lys Gly Gly Arg Ala He Val Trp Gly Asp He Ala Leu He Asp 

390 395 400 

Gly Asn He Asn Ala Gin Gly Lys Asp He Ala Lys Thr Gly Gly Phe 
405 

Val Glu Thr Ser Gly His Tyr Leu Ser He Asp Asp Asn Ala He Val 

420 425 430 

Lys Thr Lys Glu Trp Leu Leu Asp Pro Glu Asn Val Thr He Glu Ala 
435 440 445 

Pro Ser Ala Ser Arg Val Glu Leu Gly Ala Asp Arg Asn Ser His Ser 
450 455 4^0 

Ala Glu Val He Lys Val Thr Leu Lys Lys Asn Asn Thr Ser Leu Thr 

470 475 480 

Thr Leu Thr Asn Thr Thr He Ser Asn Leu Leu Lys Ser Ala His Val 
485 490 495 

Val Asn He Thr Ala Arg Arg Lys Leu Thr vkl Asn Ser Ser He Ser 
SOO 505 510 

lie Glu Arg Gly Ser His Leu He Leu His Ser Glu Gly Gin Gly Gly 
515 520 525 

Gin Gly Val Gin He Asp Lys Asp He Thr Ser Glu Gly Gly Asn Leu 
530 535 540 

Thr lie Tyr Ser Gly Gly Trp Val Asp Val His Lys Asn He Thr Leu 

550 555 560 

Gly Ser Gly Phe Leu Asn He Thr Thr Lys Glu Gly Asp He Ala Phe 
565 570 575 

Glu Asp Lys Ser Gly Arg Asn Asn Leu Thr He Thr Ala Gin Gly Thr 
580 585 590 

He Thr Ser Gly Asn Ser Asn Gly Phe Arg Phe Asn Asn Val Ser Leu 
595 600 605 

Asn Ser Leu Gly Gly Lys Leu Ser Phe Thr Asp Ser Arg Glu Asp Arc 
«10 615 620 

Gly Arg Arg Thr Lys Gly Asn He Ser Asn Lys Phe Asp Gly Thr t^u 
^25 630 635 640 

Asn He Ser Gly Thr Val Asp He Ser Met Lye Ala Pro Lys Val Ser 
645 650 655 

Trp Phe Tyr Arg Asp Lys Gly Arg Thr Tyr Trp Asn Val Thr TKr Leu 
660 665 670 

Asn Val Thr Ser Gly Ser Lys Phe Asn Leu Ser He Asp Ser Thr Gly 
675 680 665 
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Ser Gly Ser Thr Gly Pro Ser He Arg Asn Ala Glu Leu Asn Gly He 

69S 

Thr Phe Asn Lys Ala Thr Phe Asn He Ala Gin Gly Ser Thr Ala Asn 

Phe ser He Lys Ala Ser He Met Pro Phe Lys Ser Asn Ala Asn Tyr 
^25 730 

Ala Leu Phe Asn Glu Asp He Ser Val Ser Gly Gly Gly Ser Val Asn 
Phe Lys Leu Asn Ala Ser Ser Ser Asn He Gin Thr Pro Gly Val He 



765 



He Lys Ser Gin Asn Phe Asn Val Ser Gly Gly Ser Thr Leu Asn Leu 

' '3 780 

Lys Ala Glu Gly Ser Thr Glu Thr Ala Phe Ser He Glu Asn Asp Leu 

Asn Leu Asn Ala Thr Gly Gly Asn He Thr He Arg Gin Val Glu Gly 

eiO 815 

Thr Asp Ser Arg Val Asn Lys Gly Val Ala Ala Lys Lys Asn He Thr 

825 

Phe Lys Gly Gly Asn He Thr Phe Gly Ser Gin Lys Ala Thr Thr Glu 

840 

lie Lys Gly Asn Val Thr He Asn Lys Asn Thr Asn Ala Thr Leu Arg 

855 ^ 

Gly Al a Asn Phe Aia Glu Asn Lys Ser Pro Leu "Asn He Ala Gly Asn 

87S 880 
Val He Asn Asn Gly Asn Leu Thr Thr Ala Gly Ser He He Asn He 
885 890 895 

Ala Gly Asn Leu Thr Val Ser Lys Gly Ala Asn Leu Gin Ala He Thr 

905 93^0 

Asn Tyr Thr Phe Asn Val Ala Gly Ser Phe Asp Asn Asn Gly Ala Ser 

92 5 



Asn He ser He Ala Arg Gly Gly Ala Lys Phe Lys Asp He Asn Asn 

Thr ser Ser Leu A^n He Thr Thr Asn Ser Asp Thr Thr Tyr Arg Thr 

9€0 

He He Lys Gly Asn He Ser Asn Lys Ser Gly Asp Leu Asn He He 

970 

Asp Lys Lys Ser Asp Ala Glu He Gin He Gly Gly Asn He Ser Gin 

985 

Lys Glu Gly Asn Leu Thr He Ser Ser Asp Lys Val Asn He Thr Asn 

xooo 



1005 



lllo'^ """^ tlls^""- ""'^ ^'^^ JSfo^" '^^ 

Glu^Ala Glu Asn Ala Asn Leu Thr He Gin Thr Lys Glu Leu Lys Leu 

1035 1040 
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Ala Gly Asp Leu Asn lie Ser Gly Phe Asn Lys Ala Glu lie Thr Ala 
104S 1050 1055 

Lys Asn Gly Ser Asp Leu Thr lie Gly Asn Ala Ser Gly Gly Asn Ala 
1060 1065 1070 

Asp Ala Lys Lys Val Thr Phe Asp Lys Val Lys Asp Ser Lys lie Ser 
1075 1080 1085 

Thr Asp Gly His Asn Val Thr Leu Asn Ser Glu Val Lys Thr Ser Asn 
1090 1095 1100 

Gly Ser Ser Asn Ala Gly Asn Asp Asn Ser Thr Gly Leu Thr lie Ser 
llOS 1110 1115 1120 

Ala Lys Asp Val Thr Val Asn Asn Asn Val Thr Ser His Lys Thr lie 
1125 1130 1135 

Asn lie Ser Ala Ala Ala Gly Asn Val Thr Thr Lys Glu Gly Thr Thr 
1140 1145 iiso 

lie Asn Ala Thr Thr Gly Ser Val Glu Val Thr Ala Gin Asn Gly Thr 
1155 1160 1165 

lie Lys Gly Asn lie Thr Ser Gin Asn Val Thr Val Thr Ala Thr Glu 
1170 1175 1180 

Asn Leu Val Thr Thr Glu Asn Ala Val lie Asn Ala Thr Ser Gly Thr 
1185 1190 1195 1200 

Val Asn lie Ser Thr Lys Thr Gly Asp lie tys Gly Gly He Glu Ser 
1205 1210 1215 



Thr Ser Gly Asn Val Asn He Thr Ala Ser Gly Asn Thr Leu Lys Val 
1220 1225 1230 

Ser Asn He Thr Gly Gin Asp Val Thr Val Thr Ala Asp Ala Gly Ala 
1235 1240 1245 

Leu Thr Thr Thr Ala Gly Ser Thr He Ser Ala Thr Thr Gly Asn Ala 
1250 1255 1260 

Asn He Thr Thr Lys Thr Gly Asp He Asn Gly Lys Val Glu Ser Ser 
1265 1270 1275 1280 

Ser Gly Ser Val Thr Lreu Val Ala Thr Gly Ala Thr Leu Ala Val Gly 
1285 1290 1295 

Asn He Ser Gly Asn Thr Val Thr He Thr Ala Asp Ser Gly Lys Leu 
1300 1305 1310 

Thr Ser Thr Val Gly Ser Thr He Asn Gly Thr Asn Ser Val Thr Thr 
1315 1320 1325 

Ser Ser Gin Ser Gly Asp He Glu Gly Thr He Ser Gly Asn Thr Val 
1330 1335 1340 

Asn Val Thr Ala Ser Thr Gly Asp Leu Thr He Gly Asn Ser Ala Lys 
1345 1350 1355 1360 

Val Glu Ala Lys Asn Gly Ala Ala Thr Leu Thr Ala Glu Ser Gly Lys 
1365 1370 1375 

Leu Thr Thr Gin Thr Gly Ser Ser He Thr Ser Ser Asn Gly Gin Thr 
1380 1385 1390 
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Thr Leu Thr Ala Lys Asp Ser Ser He Ala Gly Asn He Asn Ala Ala 

1400 1405 

A.n v^al^Thr Leu Asn Thr TJr Gly Thr Leu Thr Thr Thr Gly Asp Ser 

^^■^^ 1420 
Lysine Asn Ala Thr Ser^Gly Thr Leu Thr Ile^Asn Ala Lys Asp Ala 

Lys Leu ASP Gly Ala^Ala Ser Gly Asp Arg^Thr Val Val Asn Ala Thr^ 

Asn Ala ser Gly^ser Gly Asn Val Thr^Ala Lys Thr Ser ^er^s^Val 

Asn lle Thr^Gly Asp Leu Asn Thr He Asn Gly Leu Asn IlTile Ser 

Glu Asn^Gly Arg Asn Thr Val Arg Leu Arg Gly Lys Glu He Asp Val 

^"^^^ 1500 
Jyj^Tyr II. Gl„ Pro Oly^v.l Al. s,r V.l Olu^du v.l II, oi„ 

Arg V.1 olu^Lys V.1 ^y. A.p ^ ^ 111° 

Thr Leu Ala Lys^Leu Gly Val Ser Ala Val Arg Phe Val Glu Pro Asn 

^^^^ 1550 

Asn Ala He Thr Val Asn Thr Gin Asn Glu Phe Thr Thr Lys Pro Ser 

1560 2^5g5 

ser Gln^val Thr. He . Ser Glu^Gly Lys Ala Cys Phe ser Ser .Gly Asn 



1560 

Gly^Ala Arg Val Cys Thr Asn Val Ala Asp Asp Gly Gin Gin Pro 

2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 1600 amino acidB 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 

Met Asn Lys He Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
val Ala val Ser Glu Leu Thr Arg Gly cys Asp His Ser Thr Glu Lys 
Gly ser Glu Lys Pro Val Arg Thr Lys Val Arg His Leu Ala Leu Lys 
pro Leu ser Ala He Leu Leu Ser Leu Gly Met Ala Ser He Pro Gin 
ser val Leu Ala Ser Gly Leu Gin Gly Met Ser Val Val His Gly Thr 
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Ala Thr Met Gin Val Asp Gly Asn Lys Thr Thr lie Arg Asn Ser Val 
85 90 95 

Asn Ala lie lie Asn Trp Lys Gin Phe Asn lie Asp Gin Asn Glu Met 
100 105 110 

Glu Gin Phe Leu Gin Glu Ser Ser Asn Ser Ala Val Phe Asn Arg Val 
115 120 125 

Thr Ser Asp Gin lie Ser Gin Leu Lys Gly lie Leu Asp Ser Asn Gly 
130 13S 140 

Gin Val Phe Leu He Asn Pro Asn Gly He Thr He Gly Lys Asp Ala 
145 150 155 160 

He He Asn Thr Asn Gly Phe Thr Ala Ser Thr Leu Asp He Ser Asn 
165 170 175 

Glu Asn He Lys Ala Arg Asn Phe Thr X^u Glu Gin Thr Lys Asp Lys 
180 185 ' 190 

Ala Leu Ala Glu He Val Asn His Gly Leu He Thr Val Gly Lys Asp 
195 200 205 

Gly Ser Val Asn Leu He Gly Gly Lys Val Lys Asn Glu Gly Val He 
210 215 220 

Ser Val Asn Gly Gly Ser He Ser Leu X*eu Ala Gly Gin Lys He Thr 
225 230 235 240 

He Ser Asp He He Asn Pro Thr He Thr Tyr Ser He Ala Ala Pro 
245 250 255 

Glu Asn Glu Ala He Asn Leu Gly Asp He Phe Ala Lys Gly Gly Asn 
260 265 270 

He Asn Val Arg Ala Ala Thr He Arg Asn Lys Gly Lys Leu Ser Ala 
275 280 265 

Asp Ser Val Ser Lys Asp Lys Ser Gly Asn He Val Leu Ser Ala Lys 
290 295 300 

Glu Gly Glu Ala Glu He Gly Gly Val He Ser Ala Gin Asn Gin Gin 
305 310 315 320 

Ala Lys Gly Gly Lys Leu Met He Thr Gly Asp Lys Val Thr Leu Lys 
325 330 335 

Thr Gly Ala Val He Asp Leu Ser Gly Lys Glu Gly Gly Glu Thr Tyr 
340 345 350 

Leu Gly Gly Asp Glu Arg Gly Glu Gly Lys Asn Gly He Gin Leu Ala 
355 360 365 

Lys Lys Thr Thr Leu Glu Lys Gly Ser Thr He Asn Val Ser Gly Lys 
370 375 380 

Glu Lys Gly Gly Arg Ala He Val Trp Gly Asp He Ala Leu He Asp 
385 390 39S 400 

Gly Asn He Asn Ala Gin Gly Ser Asp He Ala Lys Thr Gly Gly Phe 
405 410 415 

Val Glu Thr Ser Gly His Asp Leu Ser He Gly Asp Asp Val He Val 
420 425 430 
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ASP Ala Ly| Glu Trp Leu Leu Asp Pro Asp Asp Val Ser He Glu Thr 

Leu Thr ser Gly Arg Asn Asn Thr Gly Glu Asn Gin Gly Tyr Thr Thr 

Gly ASP Gly Thr Lys Glu Ser Pro Lys Gly Asn Ser He Ser Lys Pro 

Thr Leu Thr Asn Ser Thr Leu Glu Gin lie Leu Arg Arg Gly Ser Tyr 

val Asn rle Thr Ala Asn Asn Arg lie Tyr Val Asn Ser Ser ll! Asn 

Leu ser Asn Gly Ser Leu Thr Leu His Thr Lys Arg Asp Gly Val Lys 

^-2° 525 
He Asn Gly Asp He Th. Ser Asn Glu Asn Gly Asn Leu Thr He Lys 

540 ^ 
Ala Gly Ser Trp Val Asp Val His Lys Asn lie Thr Leu Gly Thx Gly 

Phe Leu Asn He Val Ala Gly Asp Ser Val Ala Phe Glu Ar^ Glu G^y 

575 

ASP Lys Ala Arg Asn Ala Thr Asp Ala Gin He Thr Ala Gin Gly Thr 

He Thr Val Asn Lys Asp Asp Lys Gin Phe Arg Phe Asn Asn Val Ser 

605 

Leu Asn Gly Tlxr Gly Lys Gly Leu Lys Phe He Ala Asn Gin Asn Asn 

^15 520 

Phe Thr His Lys Phe Asp Gly Glu He Asn He Ser Gly He Val Thr 

^35 

He Asn Gin Thr Thr Lys Lys Asp Val Lys Tyr Trp Asn Xla Ser Lys 

ASP ser Tyr Trp Asn Val Ser Ser Leu Thr Leu Asn Thr Val 71 Lys 

Phe Thr Phe He Lys Phe Val Asp Ser Gly Ser Asn Gly gI! Asp Leu 

665 

Arg ser Ser Arg Arg Ser Phe Ala Gly Val His Phe Asn Gly He Gly 

700 ^ 



Gly Lys Thr Asn Phe Asn He Gly Ala Asn Ala Lys Ala Leu Phe Lys 

"^^5 720 
Leu Lys Pro Asa Ala Ala Thr Asp Pro Lys Lys Glu Leu Pro He Thr 

'^^^ 735 



Phe Asa Ala Asn He Thr Ala «u: Gly Asn Ser Asp Ser Ser Val Met 

750 

Phe ASP He His Ala Asn Leu Thr Ser Arg Ala Ala Gly He Asn Met 

'^0 765 
ASP ser He Asn He Thr Gly Gly Leu Asp Phe Ser He Thr Ser 



77S - - — " His 



wo 97/36914 



PCT/US97/04707 



100 

Asn Arg Asn Ser Asn Ala Phe Glu lie Lys Lys Asp Leu Thr lie Asn 
785 790 795 800 

Ala Thr Gly Ser Asn Phe Ser Leu Lys Gin Thr Lys Asp Ser Phe Tyr . 

805 810 815 

Asn Glu Tyr Ser Lys His Ala lie Asn Ser Ser His Asn Leu Thr lie 
820 825 630 

Leu Gly Gly Asn Val Thr Lreu Gly Gly Glu Asn Ser Ser Ser Ser lie 
835 840 845 

Thr Gly Asn lie Asn tie Thr Asn Lys Ala Asn Val Thr Leu Gin Ala 
850 855 860 

Asp Thr Ser Asn Ser Asn Thr Gly I^eu Lys Lys Arg Thr Leu Thr Leu 
865 870 875 680 

Gly Asn lie Ser Val Glu Gly Asn Leu Ser Leu Thr Gly Ala Asn Ala 
885 890 895 

Asn lie Val Gly Asn Leu Ser lie Ala Glu Asp Ser Thr Phe Lys Gly 
900 905 910 

Glu Ala Ser Asp Asn Leu Asn lie Thr Gly Thr Phe Thr Asn Asn Gly 
915 920 925 

Thr Ala Asn lie Asn tie Lys Gly Val Val Lys. Leu Gly Asp lie Asn 
930 935 940 

Asn Lys Gly Gly Leu Asn lie Thr Thr Asn Ala Ser Gly Thr Gin Lys 
945 950 955 960 

Thr lie -lie -Asn Gly Asn lie Thr Asn Glu Lys Gly Asp Leu Asn lie 

965 970 975 

Lys Asn lie Lys Ala Asp Ala Glu lie Gin lie Gly Gly Asn lie Ser 
980 985 990 

Gin Lys Glu Gly Asn Leu Thr lie Ser Ser Asp Lys Val Asn lie Thr 
995 1000 1005 

Asn Gin lie Thr lie Lys Ala Gly Val Glu Gly Gly Arg Ser Asp Ser 
1010 1015 1020 

Ser Glu Ala Glu Asn Ala Asn Leu Thr lie Gin Tlir Lys Glu Leu Lys 
1025 1030 1035 1040 

Leu Ala Gly Asp Leu Asn lie Ser Gly Phe Asn Lys Ala Glu lie Thr 
1045 1050 1055 

Ala Lys Asn Gly Ser Asp Leu Thr lie Gly Asn Ala Ser Gly Gly Asn 
1060 1065 1070 

Ala Asp Ala Lys Lys Val Thr Phe Asp Lys Val Lys Asp Ser Lys He 
1075 1080 1085 

Ser Thr Asp Gly His Asn Val Thr Leu Asn Ser Glu Val Lys Thr Ser 
1090 1095 1100 

Asn Gly Ser Ser Asn Ala Gly Asn Asp Asn Ser Thr Gly Leu Thr lie 
1105 1110 1115 1120 

Ser Ala Lys Asp Val Thr Val Asn Asn Asn Val Thr Ser His Lys Thr 
1125 1130 1135 
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rie Asn lie Ser Ala Ala Ala Gly Asn Val Thr Thr Lys Glu Gly Thr 
" 1140 1145 iiso 

Thr He Asn Ala Thr Thr Gly Ser Val Glu Val Thr Ala Gin Asn Gly 
1155 1160 lies 

SeJ^ Gin Asn Val Thr Val Thr Ala Thr 
11^0 1175 1180 

?^-c^" "^^"^ Ala Val He Asn Ala Thr Ser Gly 

1190 1195 1200 

Thr Val Asn He Ser Thr Lys Thr Gly Asp He Lys Gly Gly lie Glu 
1205 1210 1215 

Ser Thr Ser Gly Asn Val Asn lie Thr Ala Ser Gly Asn Thr Leu Lys 
1220 1225 1230 

val Ser Asn He Thr Gly Gin Asp Val Thr Val Thr Ala Asp Ala Gly 
1235 1240 124S 

"^^"^ '^^'^ S®'^ Thr He Ser Ala Thr Thr Gly Asn 

1250 1255 1260 

Ala Asn He Thr Thr Lys Thr Gly Asp He Asn Gly Lys Val Glu Ser 

1270 1275 1280 

Ser Ser Gly Ser Val Thr Leu Val Ala Thr Gly Ala Thr Leu Ala Val 
1285 1290 1295 

Gly Asn He Ser Gly Asn Thr Val Thr He Thr Ala Asp Ser Gly Lvs 
1300 1305 1310 

Leu Thr ser Thr Val GlySer Thr lie Asa Gly Thr Asn Ser Val Thr 
1315 1320 1325 

"^^^ f^^n^®"^ lie Ser Gly Asn Thr 

1330 1335 1340 

Ytic'^*'' ''^'^ ^^"^ "^"^ Thr He Gly Asn Ser Ala 

1350 13SS j^3go 

Lys Val Glu Ala Lys Asn Gly Ala Ala Thr Leu Thr Ala Glu Ser Gly 
1365 1370 1375 

Lys Leu Thr Thr Gin Thr Gly Ser Ser He Thr Ser Ser Asn Gly Gin 
1380 1385 1390 

Thr Thr Leu Thr Ala Lys Asp Ser Ser He Ala Gly Asn He Asn Ala 
i^'S 1400 140S 

Thr Gly Thr Leu Thr Thr Thr Gly Asp 
1*10 141S 1420 

^A^J'^° ^* Thr He Asn Ala Lys Asp 

1*^^ 1*30 1435 ' 

Ala Lys Leu Asp Gly Ala Ala Ser Gly Asp Arg Thr Val Val Asn Ala 
1**5 1450 1455 

Thr Asn Ala Ser Gly Ser Gly Asn Val Thr Ala Lys Thr Ser Ser Ser 
1*«0 14S5 1470 

Val Asn He Thr Gly Asp Leu Asn Thr He Asn Gly Leu Asn He He 
1*'7S 1480 148S 
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Ser Glu Asn Gly Arg Asn Thr Val Arg Leu Arg Gly Lys Glu He Aso 
1490 1495 1500 

Val Lys Tyr He Gin Pro Gly Val Ala Ser Val Glu Glu Val He Glu 
ISOS 1510 1515 3^520 

Ala Lys Arg Val Leu Glu Lys Val Lys Asp Leu Ser Asp Glu Glu Arg 
1525 1530 1535 

Glu Thr Leu Ala Lys Leu Gly Val Ser Ala Val Arg Phe Val Glu Pro 
1540 1545 3^550 

Asn Asn Ala lie Thr Val Asn Thr Gin Asn Glu Phe Thr Thr Lys Pro 
1555 1560 1565 

Ser Ser Gin Val Thr He Ser Glu Gly Lys Ala Cys Phe Ser Ser Glv 
1570 1575 1580 

Asn Gly Ala Arg Val Cys Thr Asn Val Ala Asp Asp Gly Gin Gin Pro 

1590 1595 1600 

(2) INFORMATION FOR SEQ ID NO: 11: 

Ci) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
(b) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Val Asp Glu Val He Glu Ala Lys Arg He Leu Glu Lys Val Lys Asp 
15 10 15 

Leu Ser Asp Glu Glu Arg Glu Ala Leu Ala Lys Leu Gly 
20 25 
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What I claim is: 

InaJ" "'t'"'"^ purified nucleic acid „,olecule 

encoding a high nolecular weight protein (HMW) ^ or 
HMW4 Of a non-typeable ^aernopixiiu. strain or a variant 

^il^vT' "^'"'"'"^ inununological 

abxlity to protect against disease caused by a non- 
typeable Haemophilus strain, having: 

(a) the DNA sequence shown in Figure 8 (SEQ XD No- 
7, and encoding protein HMW3 having the derived 
anunp acid sequence of Figure 10 (SEQ id No- 9) 

(b) the DMA sequence shown in Figure 9 (SEQ Id'no" 
8) and encoding protein HMW4 having the derived 
anuno acid sequence of Figure 10 (SEQ id No- 10) 

2 An isolated and purified nucleic acid moleoule 
encodang a high molecular weight protein (HMW) of a non 
t^-peable flaeznopiziius strain, which is selected from the 
group consisting of: 

(a) a DNA sequence as shown in any one of Figures 
8 and 9 (SBQ Id Nos : 7 and 8) ; 

(b) a DNA sequence encoding an amino acid 
sequence as shown in Figure 10 <SEQ id Nos: 9 and 
10) ; or 



(c) a DNA sequence encoding a high molecular 
wexght protein of a non-typeable Hae:naphJLlus strain 
which hybridizes under stringent conditions to any 
one of the DNA sequences of (a) and (b) 
3^ The nucleic acid molecule of claim 2 wherein the 
ONA sequence (c) have at least about a 9C% identity of 
sequence to the DNA sequences (a) or (b) 

4. A vector for transformation of a host comprising 
the nucleic acid molecule of claim 2. "prisxng 

^m^^"^ isolated and purified high molecralar weight 
(HMW) protein of non-typeable ^fae^opi^ilus or any vlrilnt 
or fragment thereof retaining the immunological ability 

nleTtT' "^^""'^ - non-t^elble 

Haemophilus strain, which is characterized by at least 
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one surface-exposed B-cell epitope which is recognized 
by monoclonal antibody AD6. 

6. The protein of claim 5 which is HMWl encoded by the 
rm sequence shown in Figure l (SEQ XD No: 1), having 
the derived amino acid sequence of Figure 2 (SEQ ID No: 
2) and having an apparent molecular weight of 125 kDa. 

7. The protein claim 5 which is HMW2 encoded by the 
DNA. sequence shown in Figure 3 (SEQ ID No: 3) and having 
the derived amino acid sequence of Figure 4 (SEQ id No: 
4) and having an apparent molecular weight of 120 kDa. 

8 . The protein claimed in claim 5 which is HMl^3 
encoded by the DMA. sequence shown in Figure 8 (SEQ id 
Mo; 7) and having the derived amino acid sequence of 
Figure 10 (SEQ ID No: 9) and having an apparent 
molecular weight of 125 kDa. 

9. The protein claimed in claim 5 v/hich is HMW4 
encoded by the DNA sequence shown in Figure 9 (SEQ ID 
NO: 8) and having the derived amino acid sequence shown 
in Figure 10 (SEQ id No: 10) and having an apparent 
molecxilar weight of 123 kDa. 

10. A conjugate con^rising a protein as claimed in 
claim 5 linJced to an antigen, hapten or polysaccharide 
for eliciting an irtmune response to said antigen, hapten 
or polysaccharide. 

11. The conjugate as claimed in claim 10 wherein said 
polysaccharide is a protective polysaccharide against 
Haemophilus influenzae type b. 

12. A synthetic peptide having an amino acid sequence 
containing at least six amino acids and no more than 150 
amino acids and corresponding to at least one protective 
epitope of a high molecular weight protein HMWl, HMW2, 
HMW3 or HMW4 of non-typeable Haemophilus influejizsie, 
wherein the epitope is recognized by at least one of 
monoclonal antibodies ADS and IOCS. 

13. The peptide as claimed in claim 12 wherein the 
epitope is located within 75 amino acids of the carboxy 
terminus of the HMV71 or HMW2 protein. 
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f.:, I HMW3 nuc '^"^'rfg sequence 

REFOAMAT of: TenpS.Ccg check: -1 froa: \ lo* 4794 October 5, 1995 17:43 
(Mo doc\«wntation) 
NMa.Cco Longth: 4794 October 5. 1995 18:29 Type: N Check: 4A4 .. 

1 ATGAACAACA TATATCCTCT CAAATTCACC AAACCCCTGA ATCCTTTCCT TCCTCTCTCT CAATTCACAC CCCCT7C7CA CCATTCCACA ^-,*,*,*j^Ar.nCft 
101 CTCAAAAACC TCTTCCTACC AAACTACCCC ACTTCCCCTT AMCCCACTT TCCCCTATAT TCCTATCTTT CCCCATCCCA TCCATTCCCC AATCTCTTTT 
201 AGCGACCCCT TTACACCGAA TGACCCTCCT ACACCGTACA CCAACCATCC AACTACACCC CAATAAAACC ACTATCCGTA ATACCCTCAA TCCTATCATC 
301 AATTCGAAAC AATTTAACAT TOACCAAAAT GAAATCGTCC ACTTTTTACA AGAAACCACC AACTCTCCCG TTTTCAACCC TCTTACATCT CACCAAATCT 
401 CCCAATTAAA ACCCATTTTA CATTCTAACC GACAACTCTT TTTAATCAAC CCAAATCCTA TCACAATACC TAAACACCCA ATTATTAACA CTAATGCCTT 
501 TACTCCTTCT ACCCTAGACA TTTCTAAC&A AAACATCAAC CCCCCTAATT TCACCCTTGA CCAAA C CAAC CATAAAGCAC TCCCTGAAAT CCTGAATCAC 
601 CCTTTAATTA CCCTTSGTAA ACACSCTACC CTAAACCTTA TTCCTCCCAA AGTCAAAAAC CACCCCCTCA TTACCGTAAA TCCCCGTACT ATTTCTTTAC 
701 nCCACCCCA AAAAATCACC ATCACCGATA TAATAAATCC AACCATCACT TACACCATTC CTCCACCTCA AAACCAACCC ATCAATCTCC GCCATATTTT 
801 TCCCAAACCT CCTAACATTA ATCTCCCCCC TGCCACTATT CCCAATAAAC CTAAACTTTC TCCCCACTCT GTAACCAAAC ATAAAAGTGC TAACATTCTT 
901 CTCTCTCCCA AAGAACGTGA AC C SCAAATT CCCCGTGTAA TTTCCCCTCA AAATCACCAA CCCAAACCTC CTAAGTTGAT GATTACACCC GATAAACTTA 
1001 CATTCAAAAC CCCTCCACTT ATCCACCTTT CGCCTAAACA ACCCCCACAA ACTTATCTTC CCCCTCACCA CCCTCCCGAA CGTAAAAACG GCATTCAATT 
1101 ACCAAACAAA ACCACTTTAC AAAAACCaC AACAATTAAT CTCTCACCTA AAGAAAAACC TCCCCCCCCT ATTGTATCCC GCCATATTCC CTTAATTGAC 
1201 GCCAATATTA ATGCCCAACC TAAAGATATC CCTAAAACTC CTCCTTTTGT CCAGACCTCG CCGCATTACT TATCCATTCA TGATAACCCA ATTCTTAAAA 
1301 CAAAAGAAT6 GCTACTAGAC CC AGACAATC TCACTATTGA ACCTCCTTCC CCTTCTCGCG TCCACCTCCC TCCCGATACC AA7TCCCAC7 CSCCAOACGT 
1401 GATAAAACTC ACCCTAAAAA AAAATAACAC CTCCTTCACA ACACTAACCA ATACAACCAT TTCAAATCTT CTCAAAA6TC CCCACCTCGT GAACATAACC 
1501 CCAACGACAA AACTTACCCT TAATACCTCT ATCACTATAG AAACACCCTC CCACTTAATT CTCCACACTC AACCTCACGG CCGTCAACCT GTTCAGATTC 
1601 ATAAACATAT TACTTCTCAA 6CCGCAAATT-TAACCATTTA TTCTCCCCGA TGCCTTCATG TTCATAAAAA TAT7ACCCTT CGTACCCCCT TTTTAAACAT 
1701 CACAACTAAA CAACCAGATA TCCCCTTCGA ACACAAGTCT CGACGGAACA ACCTAACCAT TACACCCCAA GCCACCATCA CCTCAGCTAA TACTAACCCC 
1801 TTTACATTTA ACAACGTCTC TCTAAACACC CTTCCCCGAA ACCTCAGCTT TACTGACACC ACAGACCACA CACGTACAAG AACTAAGCGT AATATCTCAA 
1901 ACAAATTTGA CCCAACCTTA AACATTTCCC- CAACTCTAGA TATCTCAATG AAACCAOXA AACTCACCT6 GTTTTACAGA CACAAAGCAC CCACCTACTC 
2001 CAACCTAACC ACTTTAAATC TTACCTCCCG TAGTAAATTT AACCTCTCCA TTGACACCAC ACCAACTGGC TCAACACCTC CAACCAlACG CAATGCACAA 
2101 TTAAATCGCA TAACATTTAA TAAAC CC ACT TTTAATATCG CACAACCCTC AACACCTAAC TTTACCATCA ACCCATCAAT AATCCCCTTT AACACTAACG 
2201 CTAACTACCC ATTATTTAAT CAACATATTT CACTCTCACC GGCCCCTACC GTTAATTTCA AACTTAACCC CTCATCTACC AACATACAAA CCCCTCCCGT 
2501 AAHATAAAA TCTCAAAACT nAATCTCTC ACCACCCTCA ACTTTAAATC TCAACCCTCA ACGTTCAACA CAAACCCCTT TTTCAATAGA AAATCATTTA 
2401 AACTTAAACC CCACCCCTCC CAATATAACA ATCAGACAAC TCCACCCTAC CCATTCACCC GTCAACAAAC GTCTCCCACC CAAAAAAAAC ATAACTTTTA 
2501 AACCCCCTAA TATCACCTTC CCCTCTCAAA AACCCACAAC ACAAATCAAA CCCAATCTTA CCATCAATAA AAACACTAAC GCTACTCTTC CTCCTCCCAA 
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260t TTTT CCOSM AACAAATCCC CTTTAAATAT ACCACCAAAT CTTATTAATA ATCCCAACCT TACCACTCCC CCCTCCATTA TCAATATACC CCCAAATCTT 
2701 ACTCTTTCAA AACCCCCTAA CCTTCAACCT ATAACAAATT ACACTTTTAA TCTACCCCCC TCATTTCACA ACAATCCCCC TTCAAACATT TCCATTCCCA 
2801 GACCAfiCCCC TAAArTTAAA CATATCAATA ACACCACTAC CTTAAATATT ACCACCAACT CTCATACCAC TTACCCCACC ATTATAAAAC CCAATATATC 
2901 CAACAAATCA CCTCATTTCA ATATTATTCA TAAAAAAACC CACCCTCAAA TCCAAATTCC CCCCAATATC TCACAAAAAG AACCCAATCT CACAATTTCT 
5001 TCTCATAAAC TAAATATTAC CAATCACATA ACAATCAAAC CACCCCTTGA ACCCCCCCCT TCTCATTCAA CTGACCCACA AAATCCTAAC CTAACfATTC 
S101 AAA CC AAACA CTTAAAATTC CCACCACACC TAAATATTTC ACCC7TTAAT AAAGCAGAAA TTACACCTAA AAATCGCACT CATTTAACTA TTCCCAATCC 
5201 TACC6CTCCT AATCCTCATC CTAAAAAACT CACTTTTGAC AACCTTAAAG ATTCAAAAAT CTCCACTCAC CCTCACAATC TAACACTAAA TACCCAACTC 
1301 AAAACCTCTA ATCCTACTAC CAATCC7CCT AATCATAACA CCACCCCTTT AACCATTTCC CCAAAACATC TAACCGTAAA CAATAACCTT ACCTCCCACA 
5401 ACACAATAAA TATCTCTCCC CCACCACGAA ATCTAACAAC rAAAGAAG CC ACAACTATCA ATCCAACCAC ACCCACCCTC GAACTAACTC CTCAAAATCC 
5301 rACAATTAAA CCCAACATTA CCTCCCAAAA TCTAACACTC ACACCAACAC AAAATCTTCT TACCACACAC AATCCTGTCA TTAATCCAAC >-*ffrCGnVCA 
5601 CTAAACATTA CTACAAAAAG ACCCCATATT AAAGCTCGAA TTGAATCAAC TTCCCCTAAT CTAAATATTA CACCCACCCC CAATACACTT AACCTAACIA 
5701 ATATCACTCC TCAACATCTA ACACTAACAC CCCATCCACC ACCCTTCACA ACTACAGCAC CCTCAACCAT lAGTCCCACA ACACCCAATC CAAATATTAC 
5801 AACCAAAACA CCTGATATCA ACCCTAAACT TCAATCCACC TCCGCCTCTC TAACACTTCT TCCAACTGCA CCAACTCTTC CTGTACGTAA TATTTCACCT 
5901 AACACTCTTA CTATTACTCC CGATACCCCT AAATTAACCT CCACACTACC TTCTACAATT AATCCGACTA ATACTCTAAC CACCTCAACC CAATCACCCC 
4O01 ATATTCAACG TACAATTTCT CCTAATACAG TAAATCTTAC ACCAACCACT CCTCATTTAA CTATTC&AAA TACTCCAAAA CTTCAACCCA AAAATCGACC 
4101 TCCAACCTTA ACTCCTCAAT CACCCAAATT AACCACCCAA ACACCCTCTA CCATTACCTC AACCAATCCT CACACAACTC TTACACCCAA CGATACCACT 
4201 AT C CCAGCAA ACATTAATCC TCCTAATCTC ACCTTAAATA CCACACGCAC TTTAACTACT ACACCCCA7T CAAAGATTAA CCCAACCACT ^TACCTTAA 
4501 CAATCAATCC AAAACATGCC AAATTACATC CTCCTCCATC ACCTCACCCC ACACTAGTAA ATCCAACTAA CCCAACTCCC TCTCCTAAC6 TCACTCCCAA 
4401 AACCTCAACC ACCCTCAATA TCACCC6GCA TTTAAACACA ATAAATCCGT TAAATATCAT TTCCGAAAAT CGTAGAAACA CTCTCCCCTT AAGACCCAAC 
4501 CAAATTGATG TGAAATATAT CCAACCAGCT GTACCAACCC TACAA6ACCT AATTCAACCC AAACCCGTCC TTCAGAACCT AAAACATTTA TCTCAT6AAC 
4601 AAAGAGAAAC ACTACCCAAA CTT6CTCTAA 6TCCTCTACC TTTCCTTCAG CCAAATAATC CCATTACCCT TAATACACAA AACCACTTTA CAACCAAACC 
4701 ATCAAGTCAA CTCACAATTT CTCAACCTAA CCCCTGTTTC TCAACICCTA ATCCCCCACG ACTATCTACC AATCTTCCTG ACCATCCACA CCAC 
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a HMW4 nucleotide seouefv-^- 

lErORMAT of: fMBP^.Ccg ch«ctc: -1 from: 1 to: 4A03 October 5, 199S 17:44 
<Mo doeuHntstlon) 
Mau4.Gc9 Length: 4503 October S, 1995 18:29 Type: u Check: 3920 

1 ATGAAOUGA TATATCCTCT CAAATTCACC AAACCCCTGA ATCCTTTCCT TCCTCTGTCT CAATTCACAC CCCCTTCTCA CCATTCCACA CAj^AJ'ACCCA 
T01 CTGAAAAACC TCTTCCTACG AAACTACGCC ACTTCGCCTT AAACCCA C TT TCCCCTATAT TCCTATCTTT CSCCATCCCA TCCATTCCCC AATCTCTTTT 
201 ACCCACCCfiT TTACACCCAA TCACCCTCCT ACACCGTACA CCAACCATCC AACTACACCC CAATAAAACC ACTATCCCTA ATACCCTCAA TCCTATCATC 
301 AATTCCAAAC AATTTAACAT T C A C CAAAAT CAAATCGTCC ACTTTTTACA ACAAACCACC AACTCTCCCC TTTTCAACCC TCTTACATCT GACCAAATCT 
401 CCCAATTAAA ACCCATTTTA CATTCTAACC CACAACTCTT TTTAATCAAC CCAAATCCTA TCACAATACC TAAACACCCA ATTATTAACA CTAATCCCTT 
SOI TACTCCTTCT ACCCTACACA TTTCTAACCA AAACATCAAC CCCCCTAATT TCACCCTTCA CCAAACCAAC CATAAACCAC TCCCTCAAAT CCTCAATCAC 
601 CCTTTAAnA CCCTTCCTAA ACACCCTACC CTAAACCTTA TTCGTCCCAA ACTCAAAAAC CACCCCCTCA TTACCGTAAA TCCCCCTACT ATTTCTTTAC 
701 . TTCCACCGCA AAAAATCACC ATCACCCATA TAATAAATCC AACCATCACT TACACCATTC CTCCACCTCA AAACCAACCC ATCAATCTCC CCGATATTTT 
601 TC CC AAACCT CCTAACATTA ATCTCCCCCC TCCCACTATT CGCAATAAAC CTAAACTTTC TCCCCACTCT CTAACCAAAC ATAAAACTCC TAACATT6TT 
901 CTCTCTCCCA AA6AACCTCA ACCCCAAATT CCCCCTC7AA TTTCCCCTCA AAATCACCAA CCCAAACCTC CTAA6TTCAT GATTACACCT CATAAACTCA 
1001 CATTAAAAAC AGC7CCACTT ATCCACCTTT CACCTAAAGA ACCGGCACAC ACTTATCT7C CCCGTGATGA GCCTCCCCAA CCTAAAAATC GTATTCAATT 
1101 iC Ctt AA C AAA ACCTCTTTAG AAAAACC C TC CACAATTAAT CTATCACCCA AAGAAAAACG CCCCCCCCCT ATTCTATCCC CCCATATTCC ATTAATTAAT 
1201 CCTAACATTA ATCCTCAACC TACCCATATT CCTAAAACTC CCCCCTTTCT CCAAACATCA CCACATCACT TATCCATTCC TCATGATGTC ATTGTTGACC 
1301 CTAAACACTC CTTATTAGAC CCAGATCATC TGTCCATTGA AACTCTTACA TCTGCACCCA ATAATACCCC CGAAAACCAA CCATATACAA CAGGAGATCG 
1401 CACTAAACAC TCACCTAAAC CTAATACTAT TTCTAAACCT ACATTAACAA ACTCAACTCT TCACCAAATC CTAAGAACA6 CTTCTTATCT TAATATCACT 
1S01 CCTAATAATA CAAHTATCT TAATACCTCC ATCAACTTAT CTAATCCCAC TTTAACACTT CACACTAAAC CACATCCACT TAAAATTAAC CGTGATATTA 
1601 CCTCAAACfiA AAATCCTAAT TTAACCATTA AACCA6CCTC TTCCCTTCAT CTTCATAAAA ACATCACCa TCCTACCCCT TTTTTCAAIA TTCTCCCTCC 
1701 CCATiaCTA CCntlGAfiA GAGACCCCGA TAAACCA C CT AAC6CAACAC ATCCTCAAAT TACCCCACAA CCCACCATAA CCCTCAATAA ACATCATAAA 
1S01 CAATTTAGAT TCAATAATCT ATCTATTAAC R n fU CCC CCA ACGGTTTAAA CTTTATTCCA AATCAAAATA ATTTCACTCA TAAATTTGAT CCCCAAATTA 
1901 ACATATCTCG AATAGTAACA ATTAACCAAA CCACGAAAAA ACATGTTAAA TAC7CGAATC CATCAAAACA CTCTTACTCC AATCTTTCTT CTCTTACTTT 
2001 CAATACCCTC CAAAAATTTA CCTTTATAAA ATTCCTTCAT ACCGCCTCAA ATTCCCAACA TTTGAGCTCA TCACGTAGAA GTTTTGCACG CCTACATTTT 
2101 AAC6CCATC6 CACCCAAAA C AAACTTCAAC ATCCCACCTA ACCCAAAACC CTTATTTAAA TTAAAACCAA ACCCCCCTAC ACACCCAAAA AAAfiAATTAC 
2201 CTATTACTTT TAACC CC AA C ATTACACCTA CCCCTAACAC T6ATACCTCT CTGATCTTTG ACATACACCC CAATCTTACC TCTAGACCTG CCCGCATAAA 
2301 CATCGATTCA ATTAACATTA CCCCCCGCCT TGACTTTTCC ATAACATCCC ATAATCCCAA TACTAATCCT TTTCAAATCA AAAAACACTT AACTATAAAT 
2401 CCAACTCCCT CCAATTTTAC TCTTAACCAA ACGAAAGATT CTTTTTATAA TCAATACACC AAACACCCCA TTAACTCAAG TCATAATCTA ACCATTCTTC 
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2501 CCCCCAATCT CACTCrACCT CCCCAAAATT CAACCACTAC CATTACCCCC AATArCAATA TCACCAATAA ACCAAATCTT ACATTACAAC CTCACACCaC 
2601 CAACACCAAC ACACCCTTCA ACAAAAGAAC TCTAACTCTT CCCAATATAT CTGTTCACCG CAATTTAACC CTAACTCCTC CAAATCCAAA CATTCrcCCC 
2701 AATCTTTCTA TTCCACAAC* TTCCACATTT AAACCACAAC CCACTCACAA CCTAAACATC ACCCCCACCT TTACCAACAA CCCTACCCCC AACATTAATA 
2B01 TAAAACAACC ACTGGTAAAA CICCAACCCC ATATTATCAA TAAACCTCCT TTAAATATCA CTACTAACCC CTCACCCACT CAAAAAACCA TTATTAACCC 
2901 AAATATAACT AACCAAAAAC CCCACTTAAA CATCAA&AAT AT7AAACCC6 ACCCCCAAAT CCAAATTCCC CCCAATATCV CACAAAAACA ACCCAATCTC 
JOOl ACAATTTCTT CTCATAAACT AAATATTACC AATCACATAA CAATCAAACC ACGCCTTCAA CCCCCCCCTT CTGATTCAAC TGACCCAGAA AATCCTAACC 
3101 TAACTATTCA AA CC AAACAC TTAAAATTCC CACCACACCT AAATATTTCA CCCTTTAATA AACCAGAAAT TACACCTAAA AATCCCACTC ATTTAACTAT 
5201 TCGCAATCCT ACCCCTCCTA ATCCTCATCC TAAAAAACTC ACTTTTCACA ACCTTAAACA TTCAAAAATC TCCACTCACC CTCACAATCT AACACTAAAT 
3301 ACCCAACTGA AAACCTCTM TCCTACTACC AATCCTCCTA ATCATAACAG CACCCCTTTA ACCATTTCCC CAAAACATCT AACCGTAAAC AATAACCTTA 
3401 CCTCCCACAA CACAATAAAT ATCTCTCCCC CACCACCAAA TCTAACAACC AAACAACCCA CAACTATCAA TCCAACCACA CCCACCCTCC AAGTAACTCC 
3501 TCAAAATCCT ACAATTAAAC CCAACATTAC CTCCCAAAAT CTAACACTCA CACCAACACA AAATCTTCTT ACCACACACA ATCCTCTCAT TAATCCAACC 
3601 ACCCCCACAC TAAACATTAC TACAAAAACA CCCGATATTA AACCTCCAAT TCAATCAACT TCCCCTAATC TAAATATTAC ACCCACCCCC AATACACTTA 
3701 AGCTAACTAA TATCACTCCT CAACATCTAA CACTAACACC CGATCCACCA CCCTTCACAA CTACACCACC CTCAACCATT ACTCCCACAA CACCCAATCC 
3e01 AAATATTACA A C CAAAACAC CTCATATCAA CCCTAAACTT CAATCCACCT CCCCCTCTCT AACACTTCTT CCAACIGCAC CAACTCTTCC TGTACCTAAT 
3901 ATTTCACCTA ACACTCTTAC TATTACTCCC CATACCCCTA AATTAACCTC CACACTACCT TCTACAATTA ATCCCACTAA TACTCTAACC ACCTCAACCC 
4001 AATCACC CC A TATTCAACCT ACAATTTCTC CTAATACAGT AAATCTTACA GCAACCACTC CTCATTTAAC TATTCCAAAT ACTCCAAAAC TTGAACCCAA 
4101 AAATCCACCT CCAACCTTAA CTCCTCAATC ACCCAAATTA ACCA CCC AAA CACCCTCTAC CATTACCTCA ACCAATCCTC AGACAACTCT TACACCCAAC 
4201 CATACCACTA TCCCACCAAA CATTAATCCT CCTAATCTCA CCTTAAATAC CACACCCACT TTAACTACTA CACCCCATTC AAACATTAAC CCAACCACTG 
4301 CTACCTTAAC AATCAAT6CA AAAGATCCCA AATTACATCC TCCTCCATCA CCTCACCCCA CACTAGTAAA TCCAACTAAC CCAACTCCCT CTGCTAACCT 
4401 CACTCCCAAA ACCTCAACCA CCCTCAATAT CACCCCCCAT TTAAACACAA TAAATCCCTT AAATATCATT TCCGAAAATC CTACAAACAC TCTCCCCTTA 
4501 ACACCCAACC AAATTCATCT CAAATATATC CAACCAGCTC TACCAAGCGT ACAA6ACCTA ATTCAACCCA AACCCCTCCT TCACAACCTA AAACATTTAT 
4401 CTCATCAACA AAGAGAAACA CTACCCAAAC TTCCTCTAAC TCCTCTACCT TTCCTTCACC CAAATAATCC CATTACCCTT AATACACAAA ACCACTTTAC 
4701 AACCAAA CC A TCAACTCAAC T6ACAATTTC TCAACCTAAC raSTCTTTCT CAACTCCTAA TGCCCCACCA CTATCTACCA ATCTTCCTCA C6ATCCACAC 
4A01 CAC 
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FIG. J, Western immunoblot assay of cell sonicates prepared 
from £. coli transformed with plasmid pT7-7 (lanes 1 and 2), 
pHMWl.2 (lanes 3 and 4), pHMWl^ (lanes 5 and 6), or pHMWl-14 
(lanes 7 and 8). The sonicates were probed with an £. co/i*absorbed 
adult serum sample with high-titer antibody against high-molecuiar- 
weight proteins. Lanes labeled U and I represent sonicates prepared 
before and after induction of the growing samples with IPTG, 
respectively. The arrows indicate protein bands of interest as 
described in the text. 



BNSDOCIO: <WO 9736914A1 I > 



wo 97/36914 



PCT/US97A>4707 




wo 97/36914 



PCT/US97/04707 



200K 



116K 




43K 





>-;V;32r.v; 

*^ S 7 12 14 15 

FIG. Western immunoblot assay of cell sonicates from a panel 
of epidemiologically unrelated nontypeabie H. influenzae strains. 
TTiesonicates were probed with rabbit antiserum prepared against 
HMWi^ recombinant protein. The strain designations are indicated 
by the numbers below each lane. 
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FIG.-^, Western immunobiot assay of cell sonicates from a panel 
of epidemiologically unrelated nontypeable //. influenzae strains. 
The sonicates were probed with monoclonal antibody X3C, a 
murine IgG antibody which recognizes the filamentous hemaggluti- 
nin of pertussis (13). The strain designations are indicated by the 
numbers below each lane. 
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